Win XP SP2 SMP locking (8.1.4)

Started by Oleg Bartunovover 19 years ago7 messages
#1Oleg Bartunov
oleg@sai.msu.su

Hi there,

I'm looking into strange locking, which happens on WinXP SP2 SMP
machine running 8.1.4 with stats_row_level=on. This is the only
combination (# of cpu and stats_row_level) which has problem -
SMP + stats_row_level.

The same test runs fine with one cpu (restarted machine with /numproc=1)
disregarding to stats_row_level option.

Customer's application loads data into database and sometimes process
stopped, no cpu, no io activity. PgAdmin shows current query is 'COMMIT'.
I tried to attach gdb to postgres and client processes, but backtrace looks
useless (see below). Running vacuum analyze of this database in separate
process cause loading process to continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
affect the problem ?

postgres.exe:

(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x019effd0 in ?? ()
#6 0xf784e548 in ?? ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
address 0x19f0000

application:
(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x0196ffd0 in ?? ()
#6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from
#11 0x00000000 in ?? () from
#12 0x00000000 in ?? () from
#13 0x00000000 in ?? () from
(gdb) Cannot access memory at address 0x1970000

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Oleg Bartunov (#1)
Re: Win XP SP2 SMP locking (8.1.4)

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
affect the problem ?

What do you mean locking? Do you mean the postgresql process locks up?
E.g; can you still connect to PostgreSQL from another connection? If not
is there an error?

Joshua D. Drake

postgres.exe:

(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x019effd0 in ?? ()
#6 0xf784e548 in ?? ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
address 0x19f0000

application:
(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x0196ffd0 in ?? ()
#6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from
#11 0x00000000 in ?? () from
#12 0x00000000 in ?? () from
#13 0x00000000 in ?? () from
(gdb) Cannot access memory at address 0x1970000

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Joshua D. Drake (#2)
Re: Win XP SP2 SMP locking (8.1.4)

On Thu, 5 Oct 2006, Joshua D. Drake wrote:

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
affect the problem ?

What do you mean locking? Do you mean the postgresql process locks up?
E.g; can you still connect to PostgreSQL from another connection? If not
is there an error?

It looks like application is waiting something from postgresql, but
postgresql thinks it did the job. vacuum analyze gets things moving.
I could connect to PostgreSQL from another connection, for example
pgAdmin still works with this database.

Joshua D. Drake

postgres.exe:

(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x019effd0 in ?? ()
#6 0xf784e548 in ?? ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
address 0x19f0000

application:
(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x0196ffd0 in ?? ()
#6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from
#11 0x00000000 in ?? () from
#12 0x00000000 in ?? () from
#13 0x00000000 in ?? () from
(gdb) Cannot access memory at address 0x1970000

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Magnus Hagander
mha@sollentuna.net
In reply to: Oleg Bartunov (#1)
Re: Win XP SP2 SMP locking (8.1.4)

Hi there,

I'm looking into strange locking, which happens on WinXP SP2
SMP machine running 8.1.4 with stats_row_level=on. This is
the only combination (# of cpu and stats_row_level) which has
problem - SMP + stats_row_level.

The same test runs fine with one cpu (restarted machine with
/numproc=1) disregarding to stats_row_level option.

Customer's application loads data into database and sometimes
process stopped, no cpu, no io activity. PgAdmin shows
current query is 'COMMIT'.
I tried to attach gdb to postgres and client processes, but
backtrace looks useless (see below). Running vacuum analyze
of this database in separate process cause loading process to
continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in
all combinations ! Any idea what changes from 8.1.4 to
8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could possibly
be it.

//Magnus

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Magnus Hagander (#4)
Re: Win XP SP2 SMP locking (8.1.4)

On Thu, 5 Oct 2006, Magnus Hagander wrote:

Hi there,

I'm looking into strange locking, which happens on WinXP SP2
SMP machine running 8.1.4 with stats_row_level=on. This is
the only combination (# of cpu and stats_row_level) which has
problem - SMP + stats_row_level.

The same test runs fine with one cpu (restarted machine with
/numproc=1) disregarding to stats_row_level option.

Customer's application loads data into database and sometimes
process stopped, no cpu, no io activity. PgAdmin shows
current query is 'COMMIT'.
I tried to attach gdb to postgres and client processes, but
backtrace looks useless (see below). Running vacuum analyze
of this database in separate process cause loading process to
continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in
all combinations ! Any idea what changes from 8.1.4 to
8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could possibly
be it.

I backported them to REL8_1_STABLE but it doesn't helped. Any other idea
what to do, or how to debug the situation ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#6Magnus Hagander
mha@sollentuna.net
In reply to: Oleg Bartunov (#5)
Re: Win XP SP2 SMP locking (8.1.4)

I'm looking into strange locking, which happens on WinXP SP2 SMP
machine running 8.1.4 with stats_row_level=on. This is the only
combination (# of cpu and stats_row_level) which has problem -

SMP +

stats_row_level.

The same test runs fine with one cpu (restarted machine with
/numproc=1) disregarding to stats_row_level option.

Customer's application loads data into database and sometimes

process

stopped, no cpu, no io activity. PgAdmin shows current query is
'COMMIT'.
I tried to attach gdb to postgres and client processes, but

backtrace

looks useless (see below). Running vacuum analyze of this

database in

separate process cause loading process to continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to
8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could
possibly be it.

I backported them to REL8_1_STABLE but it doesn't helped. Any other
idea what to do, or how to debug the situation ?

Unfortunatly, the debugger support for mingw is absolutely horrible. But
you can try process explorer from www.sysinternals.com and see if it'll
give you a decent backtrace. Sometimes it works when others don't.
Either that, or try the Visual Studio or Windows debuggers, they can
usually at least show you if it's stuck waiting on something in the
kernel.

//Magnus

#7Rocco Altier
RoccoA@Routescape.com
In reply to: Magnus Hagander (#6)
Re: Win XP SP2 SMP locking (8.1.4)

Didn't the stats communication process get redone for 8.2?

Or atleast some time-out related stuff.

Since the problem seems to be related to stats_row_level being on, I
wonder if the problem might be in that sub-system. I am guessing that
vacuum is pushing some more stats through, which might explain how that
allows it to unfreeze.

-rocco

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Magnus Hagander
Sent: Friday, October 06, 2006 6:00 AM
To: Oleg Bartunov
Cc: Pgsql Hackers
Subject: Re: [HACKERS] Win XP SP2 SMP locking (8.1.4)

I'm looking into strange locking, which happens on WinXP SP2 SMP
machine running 8.1.4 with stats_row_level=on. This is the only
combination (# of cpu and stats_row_level) which has problem -

SMP +

stats_row_level.

The same test runs fine with one cpu (restarted machine with
/numproc=1) disregarding to stats_row_level option.

Customer's application loads data into database and sometimes

process

stopped, no cpu, no io activity. PgAdmin shows current query is
'COMMIT'.
I tried to attach gdb to postgres and client processes, but

backtrace

looks useless (see below). Running vacuum analyze of this

database in

separate process cause loading process to continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to
8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could
possibly be it.

I backported them to REL8_1_STABLE but it doesn't helped. Any other
idea what to do, or how to debug the situation ?

Unfortunatly, the debugger support for mingw is absolutely
horrible. But
you can try process explorer from www.sysinternals.com and
see if it'll
give you a decent backtrace. Sometimes it works when others don't.
Either that, or try the Visual Studio or Windows debuggers, they can
usually at least show you if it's stuck waiting on something in the
kernel.

//Magnus

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org