Unresolved Win32 bug reports

Started by Bruce Momjianover 19 years ago15 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:

integer division
shared memory
statistics collector
rename
fsync

I have put the emails at the bottom of the patches_hold queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#2Jim C. Nasby
jnasby@pervasive.com
In reply to: Bruce Momjian (#1)
Re: Unresolved Win32 bug reports

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.

I've been unable to produce the same behavior on a single-proc machine.

Please let me know if there's any more info that would be helpful.

On Thu, Apr 20, 2006 at 07:02:01AM -0400, Bruce Momjian wrote:

Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:

integer division
shared memory
statistics collector
rename
fsync

I have put the emails at the bottom of the patches_hold queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Jim C. Nasby (#2)
Re: Unresolved Win32 bug reports

On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#4Jim C. Nasby
jnasby@pervasive.com
In reply to: Martijn van Oosterhout (#3)
Re: Unresolved Win32 bug reports

On Thu, Apr 20, 2006 at 07:25:15PM +0200, Martijn van Oosterhout wrote:

On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...

Any pointers on how to get that setup? IS gdb part of the mingw runtime?

BTW, this appears to be readily reproducable, so it might be a lot more
productive for one of the windows hackers to test this themselves...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#3)
Re: Unresolved Win32 bug reports

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity).

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...

Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.

regards, tom lane

#6Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#5)
Re: Unresolved Win32 bug reports

pgbench) just stops doing work (CPU usage drops to

nothing, as does

disk activity). I've been able to repro this on 2 Intel

boxes (one a

2 way, one a 4 way), and a dual Opteron, all running the

latest windows binary.

A 50 connection test running 1000 transactions is pretty much
ensured to fail.

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...

Any pointers on how to get that setup? IS gdb part of the
mingw runtime?

Yes. It's quite crappy compared to on unix though - I've never been able
to make it do the right thing all the way :-(

BTW, this appears to be readily reproducable, so it might be
a lot more productive for one of the windows hackers to test
this themselves...

It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?

//Magnus

#7Jim C. Nasby
jnasby@pervasive.com
In reply to: Magnus Hagander (#6)
Re: Unresolved Win32 bug reports

On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:

It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?

Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...

@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#8Larry Rosenman
lrosenman@pervasive.com
In reply to: Jim C. Nasby (#7)
Re: Unresolved Win32 bug reports

Jim C. Nasby wrote:

On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:

It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?

Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...

@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench

It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.

:(

LER

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#9Larry Rosenman
lrosenman@pervasive.com
In reply to: Larry Rosenman (#8)
Re: Unresolved Win32 bug reports

Larry Rosenman wrote:

Jim C. Nasby wrote:

On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:

It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?

Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...

@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench

It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.

:(

LER

I may have spoken too soon :(

More in a bit.

LER

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#10Jim C. Nasby
jnasby@pervasive.com
In reply to: Larry Rosenman (#9)
Re: Unresolved Win32 bug reports

On Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:

It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.

:(

LER

I may have spoken too soon :(

I took a look and in fact the machine was just disk bound, so it appears
that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
(all the machines I produced the error on are running w2k3 server).

I'll try and pin down better exactly what hardware/software will
reproduce this. In the meantime, if anyone has any good info for getting
a dump of one of these processes...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#11Bort, Paul
pbort@tmwsystems.com
In reply to: Jim C. Nasby (#10)
Re: Unresolved Win32 bug reports

Some of the SysInternals tools might be a start.

ProcessExplorer provides information about processes:
http://www.sysinternals.com/Utilities/ProcessExplorer.html

DebugView shows Debugging output (not sure if PG uses this):
http://www.sysinternals.com/Utilities/DebugView.html

Also, I haven't used it, but this looks like the Windows equivalent of
gdb:
http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Thursday, April 20, 2006 4:14 PM
To: Larry Rosenman
Cc: Magnus Hagander; Martijn van Oosterhout; Bruce Momjian;
PostgreSQL-development
Subject: Re: [HACKERS] Unresolved Win32 bug reports

On Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:

It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.

:(

LER

I may have spoken too soon :(

I took a look and in fact the machine was just disk bound, so
it appears
that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
(all the machines I produced the error on are running w2k3 server).

I'll try and pin down better exactly what hardware/software will
reproduce this. In the meantime, if anyone has any good info
for getting
a dump of one of these processes...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#1)
Re: Unresolved Win32 bug reports

Bruce Momjian said:

Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:

integer division
shared memory
statistics collector
rename
fsync

I have put the emails at the bottom of the patches_hold queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

There's also a pg_config buglet that David Fetter found that still needs to
be fixed.

I am currently travelling on family business, but when I return home in a
couple of weeks will be working on getting my new machine built, and
installing a permanent Windows VM (among others), which will make it easier
for me to look at Windows issues within my realm of competence.

cheers

andrew

#13Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Bruce Momjian (#1)
Re: Unresolved Win32 bug reports

"Tom Lane" <tgl@sss.pgh.pa.us> wrote

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does

disk

activity).

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...

Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.

I also suspect the EAGAIN error reports are related to the semaphore code.
So if possible, I suggest we patch the code and test it.

Regards,
Qingqing

#14Jim C. Nasby
jnasby@pervasive.com
In reply to: Qingqing Zhou (#13)
Re: Unresolved Win32 bug reports

On Mon, Apr 24, 2006 at 10:23:07AM +0800, Qingqing Zhou wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> wrote

Martijn van Oosterhout <kleptog@svana.org> writes:

On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:

Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does

disk

activity).

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...

Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code? It's clearly windows-specific since no one's
ever reported any such thing on Unixen.

I also suspect the EAGAIN error reports are related to the semaphore code.
So if possible, I suggest we patch the code and test it.

There a patched build available for testing? (I'd rather not have to
figure out how to get windows builds working, unless there's some kind
of instructions somewhere...)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#15Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Bruce Momjian (#1)
Re: Unresolved Win32 bug reports

""Jim C. Nasby"" <jnasby@pervasive.com> wrote

There a patched build available for testing? (I'd rather not have to
figure out how to get windows builds working, unless there's some kind
of instructions somewhere...)
--

Not yet - the patch is still pending.

Regards,
Qingqing