Unresolved Win32 bug reports
Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:
integer division
shared memory
statistics collector
rename
fsync
I have put the emails at the bottom of the patches_hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.
I've been unable to produce the same behavior on a single-proc machine.
Please let me know if there's any more info that would be helpful.
On Thu, Apr 20, 2006 at 07:02:01AM -0400, Bruce Momjian wrote:
Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:integer division
shared memory
statistics collector
rename
fsyncI have put the emails at the bottom of the patches_hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com+ If your life is a hard drive, Christ can be your backup. +
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.
Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Thu, Apr 20, 2006 at 07:25:15PM +0200, Martijn van Oosterhout wrote:
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...
Any pointers on how to get that setup? IS gdb part of the mingw runtime?
BTW, this appears to be readily reproducable, so it might be a lot more
productive for one of the windows hackers to test this themselves...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Martijn van Oosterhout <kleptog@svana.org> writes:
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity).
Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...
Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.
regards, tom lane
pgbench) just stops doing work (CPU usage drops to
nothing, as does
disk activity). I've been able to repro this on 2 Intel
boxes (one a
2 way, one a 4 way), and a dual Opteron, all running the
latest windows binary.
A 50 connection test running 1000 transactions is pretty much
ensured to fail.Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...Any pointers on how to get that setup? IS gdb part of the
mingw runtime?
Yes. It's quite crappy compared to on unix though - I've never been able
to make it do the right thing all the way :-(
BTW, this appears to be readily reproducable, so it might be
a lot more productive for one of the windows hackers to test
this themselves...
It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?
//Magnus
Import Notes
Resolved by subject fallback
On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?
Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...
@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote:
On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench
It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
:(
LER
--
Larry Rosenman
Database Support Engineer
PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531
Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com
Import Notes
Resolved by subject fallback
Larry Rosenman wrote:
Jim C. Nasby wrote:
On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n benchIt seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
:(
LER
I may have spoken too soon :(
More in a bit.
LER
--
Larry Rosenman
Database Support Engineer
PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531
Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com
Import Notes
Resolved by subject fallback
On Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:
It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
:(
LER
I may have spoken too soon :(
I took a look and in fact the machine was just disk bound, so it appears
that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
(all the machines I produced the error on are running w2k3 server).
I'll try and pin down better exactly what hardware/software will
reproduce this. In the meantime, if anyone has any good info for getting
a dump of one of these processes...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Some of the SysInternals tools might be a start.
ProcessExplorer provides information about processes:
http://www.sysinternals.com/Utilities/ProcessExplorer.html
DebugView shows Debugging output (not sure if PG uses this):
http://www.sysinternals.com/Utilities/DebugView.html
Also, I haven't used it, but this looks like the Windows equivalent of
gdb:
http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Thursday, April 20, 2006 4:14 PM
To: Larry Rosenman
Cc: Magnus Hagander; Martijn van Oosterhout; Bruce Momjian;
PostgreSQL-development
Subject: Re: [HACKERS] Unresolved Win32 bug reportsOn Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:
It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
:(
LER
I may have spoken too soon :(
I took a look and in fact the machine was just disk bound, so
it appears
that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
(all the machines I produced the error on are running w2k3 server).I'll try and pin down better exactly what hardware/software will
reproduce this. In the meantime, if anyone has any good info
for getting
a dump of one of these processes...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Import Notes
Resolved by subject fallback
Bruce Momjian said:
Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:integer division
shared memory
statistics collector
rename
fsyncI have put the emails at the bottom of the patches_hold queue:
There's also a pg_config buglet that David Fetter found that still needs to
be fixed.
I am currently travelling on family business, but when I return home in a
couple of weeks will be working on getting my new machine built, and
installing a permanent Windows VM (among others), which will make it easier
for me to look at Windows issues within my realm of competence.
cheers
andrew
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Martijn van Oosterhout <kleptog@svana.org> writes:
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does
disk
activity).
Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.
I also suspect the EAGAIN error reports are related to the semaphore code.
So if possible, I suggest we patch the code and test it.
Regards,
Qingqing
On Mon, Apr 24, 2006 at 10:23:07AM +0800, Qingqing Zhou wrote:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Martijn van Oosterhout <kleptog@svana.org> writes:
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as doesdisk
activity).
Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.I also suspect the EAGAIN error reports are related to the semaphore code.
So if possible, I suggest we patch the code and test it.
There a patched build available for testing? (I'd rather not have to
figure out how to get windows builds working, unless there's some kind
of instructions somewhere...)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
""Jim C. Nasby"" <jnasby@pervasive.com> wrote
There a patched build available for testing? (I'd rather not have to
figure out how to get windows builds working, unless there's some kind
of instructions somewhere...)
--
Not yet - the patch is still pending.
Regards,
Qingqing