Some 9.5beta2 backend processes not terminating properly?

Started by Shay Rojanskyover 10 years ago72 messageshackers

roji@roji.org

over 10 years ago

After setting up 9.5beta2 on the Npgsql build server and running the Npgsql
test suite against I've noticed some weird behavior.

The tests run for a couple minutes, open and close some connection. With my
pre-9.5 backends, the moment the test runner exits I can see that all
backend processes exit immediately, and pg_activity_stat has no rows
(except the querying one). With 9.5beta2, however, some backend processes
continue to stay alive beyond the test runner, and pg_activity_stat
contains extra rows (state idle, waiting false). This situation persists
until I restart PostgreSQL.

This happens consistently on two machines, running Windows 7 and Windows
10. Both client and server are on the same machine and use TCP to
communicate. I can investigate further and try to produce a more isolated
repro but I thought I'd talk to you guys first.

Any thoughts or ideas on what might cause this? Any suggestions for
tracking this down?

Shay

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Shay Rojansky (#1)

Re: Some 9.5beta2 backend processes not terminating properly?

Shay Rojansky <roji@roji.org> writes:

After setting up 9.5beta2 on the Npgsql build server and running the Npgsql
test suite against I've noticed some weird behavior.

The tests run for a couple minutes, open and close some connection. With my
pre-9.5 backends, the moment the test runner exits I can see that all
backend processes exit immediately, and pg_activity_stat has no rows
(except the querying one). With 9.5beta2, however, some backend processes
continue to stay alive beyond the test runner, and pg_activity_stat
contains extra rows (state idle, waiting false). This situation persists
until I restart PostgreSQL.

No idea what's happening, but a couple of questions:

* Are you using SSL connections?

* Can you get stack traces from the seemingly-stuck backends?

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Tom Lane (#2)

Re: Some 9.5beta2 backend processes not terminating properly?

The tests run for a couple minutes, open and close some connection. With

my

pre-9.5 backends, the moment the test runner exits I can see that all
backend processes exit immediately, and pg_activity_stat has no rows
(except the querying one). With 9.5beta2, however, some backend processes
continue to stay alive beyond the test runner, and pg_activity_stat
contains extra rows (state idle, waiting false). This situation persists
until I restart PostgreSQL.

No idea what's happening, but a couple of questions:

* Are you using SSL connections?

* Can you get stack traces from the seemingly-stuck backends?

Most of my tests don't use SSL but some do. Looking at the query field in
pg_stat_activity I can see queries that don't seem to originate from SSL
tests.

Note that the number of backends that stay stuck after the tests is
constant (always 12).

Here's are stack dumps of the same process taken with both VS2015 Community
and Process Explorer, I went over 4 processes and saw the same thing. Let
me know what I else I can provide to help.

From VS2015 Community:

Main Thread

ntdll.dll!NtWaitForMultipleObjects() Unknown

KernelBase.dll!WaitForMultipleObjectsEx() Unknown
KernelBase.dll!WaitForMultipleObjects() Unknown
postgres.exe!WaitLatchOrSocket(volatile Latch * latch, int wakeEvents,
unsigned __int64 sock, long timeout) Line 202 C
postgres.exe!secure_read(Port * port, void * ptr, unsigned __int64 len)
Line 151 C
postgres.exe!pq_getbyte() Line 926 C
postgres.exe!SocketBackend(StringInfoData * inBuf) Line 345 C
postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname,
const char * username) Line 3984 C
postgres.exe!BackendRun(Port * port) Line 4236 C
postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4727 C
postgres.exe!main(int argc, char * * argv) Line 211 C
postgres.exe!__tmainCRTStartup() Line 626 C
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

Worker Thread

ntdll.dll!NtWaitForWorkViaWorkerFactory() Unknown

ntdll.dll!TppWorkerThread() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

Worker Thread

ntdll.dll!NtFsControlFile() Unknown

KernelBase.dll!ConnectNamedPipe() Unknown
postgres.exe!pg_signal_thread(void * param) Line 279 C
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

Worker Thread

ntdll.dll!NtWaitForSingleObject() Unknown

KernelBase.dll!WaitForSingleObjectEx() Unknown
postgres.exe!pg_timer_thread(void * param) Line 49 C
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

From Process Explorer (slightly different):

ntoskrnl.exe!KeSynchronizeExecution+0x3de6
ntoskrnl.exe!KeWaitForSingleObject+0xc7a
ntoskrnl.exe!KeWaitForSingleObject+0x709
ntoskrnl.exe!KeWaitForSingleObject+0x375
ntoskrnl.exe!IoQueueWorkItem+0x370
ntoskrnl.exe!KeRemoveQueueEx+0x16ba
ntoskrnl.exe!KeWaitForSingleObject+0xe8e
ntoskrnl.exe!KeWaitForSingleObject+0x709
ntoskrnl.exe!KeWaitForMultipleObjects+0x24e
ntoskrnl.exe!ObWaitForMultipleObjects+0x2bd
ntoskrnl.exe!IoWMIRegistrationControl+0x2402
ntoskrnl.exe!setjmpex+0x3943
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x243
postgres.exe!secure_read+0xb0
postgres.exe!pq_getbyte+0xec
postgres.exe!get_stats_option_name+0x392
postgres.exe!PostgresMain+0x537
postgres.exe!ShmemBackendArrayAllocation+0x2a6a
postgres.exe!SubPostmasterMain+0x273
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x130b
KERNEL32.DLL!BaseThreadInitThunk+0x22
ntdll.dll!RtlUserThreadStart+0x34

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#3)

Re: Some 9.5beta2 backend processes not terminating properly?

On 2015-12-29 12:41:40 +0200, Shay Rojansky wrote:

The tests run for a couple minutes, open and close some connection. With

my

pre-9.5 backends, the moment the test runner exits I can see that all
backend processes exit immediately, and pg_activity_stat has no rows
(except the querying one). With 9.5beta2, however, some backend processes
continue to stay alive beyond the test runner, and pg_activity_stat
contains extra rows (state idle, waiting false). This situation persists
until I restart PostgreSQL.

Could you describe the worklad a bit more? Is this rather concurrent? Do
you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

Note that the number of backends that stay stuck after the tests is
constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

Here's are stack dumps of the same process taken with both VS2015 Community
and Process Explorer, I went over 4 processes and saw the same thing. Let
me know what I else I can provide to help.

From VS2015 Community:

Main Thread

ntdll.dll!NtWaitForMultipleObjects() Unknown

KernelBase.dll!WaitForMultipleObjectsEx() Unknown
KernelBase.dll!WaitForMultipleObjects() Unknown
postgres.exe!WaitLatchOrSocket(volatile Latch * latch, int wakeEvents,
unsigned __int64 sock, long timeout) Line 202 C
postgres.exe!secure_read(Port * port, void * ptr, unsigned __int64 len)
Line 151 C
postgres.exe!pq_getbyte() Line 926 C
postgres.exe!SocketBackend(StringInfoData * inBuf) Line 345 C
postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname,
const char * username) Line 3984 C
postgres.exe!BackendRun(Port * port) Line 4236 C
postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4727 C
postgres.exe!main(int argc, char * * argv) Line 211 C
postgres.exe!__tmainCRTStartup() Line 626 C
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

Hm. So we're waiting for the latch, and expecting to get a FD_CLOSE
error back because the socket is actually closed. Which should happen
always in that path - a read through win32_latch.c doesn't show any
obvious problems. But then I really have not too much clue about windows
development.

How are your clients disconnecting? Possibly without properly
disconnecting?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#4)

Re: Some 9.5beta2 backend processes not terminating properly?

Could you describe the worklad a bit more? Is this rather concurrent? Do
you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

The workload is a a rather diverse set of integration tests executed with
Npgsql. There's no concurrency whatsoever - tests are executed serially.
The backends stay alive indefinitely, until they are killed. All this is
over localhost with TCP. I can try other scenarios if that'll help.

Note that the number of backends that stay stuck after the tests is
constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

Well, I ran the testsuite twice in parallel, and got... 23 backends stuck
at the end.

How are your clients disconnecting? Possibly without properly
disconnecting?

That's possible, definitely in some of the test cases.

What I can do is try to isolate things further by playing around with the
tests and trying to see if a more minimal repro can be done - I'll try
doing this later today or tomorrow. If anyone has any other specific tests
or checks I should do let me know.

amit.kapila16@gmail.com

over 10 years ago

In reply to: Shay Rojansky (#5)

Re: Some 9.5beta2 backend processes not terminating properly?

On Tue, Dec 29, 2015 at 7:04 PM, Shay Rojansky <roji@roji.org> wrote:

Could you describe the worklad a bit more? Is this rather concurrent? Do

you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

The workload is a a rather diverse set of integration tests executed with
Npgsql. There's no concurrency whatsoever - tests are executed serially.
The backends stay alive indefinitely, until they are killed. All this is
over localhost with TCP. I can try other scenarios if that'll help.

What procedure do you use to kill backends? Normally, if we kill
via task manager using "End Process", it is considered as backend
crash and the server gets restarted and all other backends got
disconnected.

Note that the number of backends that stay stuck after the tests is

constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

Well, I ran the testsuite twice in parallel, and got... 23 backends stuck
at the end.

How are your clients disconnecting? Possibly without properly
disconnecting?

That's possible, definitely in some of the test cases.

What I can do is try to isolate things further by playing around with the
tests and trying to see if a more minimal repro can be done - I'll try
doing this later today or tomorrow. If anyone has any other specific tests
or checks I should do let me know.

I think first we should try to isolate whether the hanged backends
are due to the reason that they are not disconnected properly or
there is some other factor involved as well, so you can try to kill/
disconnect the sessions connected via psql in the same way as
you are doing for connections with Npgsql and see if you can
reproduce the same behaviour.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

roji@roji.org

over 10 years ago

In reply to: Amit Kapila (#6)

Re: Some 9.5beta2 backend processes not terminating properly?

OK, I finally found some time to dive into this.

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on Ubuntu 15.10. The client runs on Windows as well
(although I doubt that's important).

In case it helps, here's a gist
<https://gist.github.com/roji/33df4e818c5d64a607aa> with some .NET code
that uses Npgsql 3.0.4 to reproduce this.

If there's anything else I can do please let me know.

Shay

On Wed, Dec 30, 2015 at 5:32 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Show quoted text

On Tue, Dec 29, 2015 at 7:04 PM, Shay Rojansky <roji@roji.org> wrote:

Could you describe the worklad a bit more? Is this rather concurrent? Do

you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

The workload is a a rather diverse set of integration tests executed with
Npgsql. There's no concurrency whatsoever - tests are executed serially.
The backends stay alive indefinitely, until they are killed. All this is
over localhost with TCP. I can try other scenarios if that'll help.

What procedure do you use to kill backends? Normally, if we kill
via task manager using "End Process", it is considered as backend
crash and the server gets restarted and all other backends got
disconnected.

Note that the number of backends that stay stuck after the tests is

constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

Well, I ran the testsuite twice in parallel, and got... 23 backends stuck
at the end.

How are your clients disconnecting? Possibly without properly
disconnecting?

That's possible, definitely in some of the test cases.

What I can do is try to isolate things further by playing around with the
tests and trying to see if a more minimal repro can be done - I'll try
doing this later today or tomorrow. If anyone has any other specific tests
or checks I should do let me know.

I think first we should try to isolate whether the hanged backends
are due to the reason that they are not disconnected properly or
there is some other factor involved as well, so you can try to kill/
disconnect the sessions connected via psql in the same way as
you are doing for connections with Npgsql and see if you can
reproduce the same behaviour.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#7)

Re: Some 9.5beta2 backend processes not terminating properly?

Hi,

On 2015-12-30 19:01:10 +0200, Shay Rojansky wrote:

OK, I finally found some time to dive into this.

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on Ubuntu 15.10. The client runs on Windows as well
(although I doubt that's important).

Hm. So that seems to indicate that, on windows, we're not properly
recognizing dead sockets in the latch code. Could you check, IIRC with
netstat or something like it, in what state the connections are?

Any chance you could single-step through WaitLatchOrSocket() with a
debugger? Without additional information this is rather hard to
diagnose.

On Wed, Dec 30, 2015 at 5:32 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

What procedure do you use to kill backends? Normally, if we kill
via task manager using "End Process", it is considered as backend
crash and the server gets restarted and all other backends got
disconnected.

Unless I miss something major here the problem is clients disconnecting
and leaving backends hanging. The killing of backends only comes into
play after that's already the case.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Shay Rojansky (#7)

Re: Some 9.5beta2 backend processes not terminating properly?

Shay Rojansky <roji@roji.org> writes:

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on Ubuntu 15.10.

Nor OS X. Ugh. My first thought was that ac1d7945f broke this, but
that's only in HEAD not 9.5, so some earlier change must be responsible.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#8)

Re: Some 9.5beta2 backend processes not terminating properly?

Andres Freund <andres@anarazel.de> writes:

On 2015-12-30 19:01:10 +0200, Shay Rojansky wrote:

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on Ubuntu 15.10. The client runs on Windows as well
(although I doubt that's important).

Hm. So that seems to indicate that, on windows, we're not properly
recognizing dead sockets in the latch code.

Or we just broke EOF detection on Windows sockets in general. It might be
worth checking if the problem appears on the client side; that is, given a
psql running on Windows, do local-equivalent-of-kill-9 on the connected
backend, and see if psql notices. (Hm, although if it's idle psql wouldn't
notice until you next try a command, so it might be hard to tell. Maybe
kill -9 while the backend is in process of a long query?)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#9)

Re: Some 9.5beta2 backend processes not terminating properly?

On 2015-12-30 12:30:43 -0500, Tom Lane wrote:

Nor OS X. Ugh. My first thought was that ac1d7945f broke this, but
that's only in HEAD not 9.5, so some earlier change must be responsible.

The backtrace in
http://archives.postgresql.org/message-id/CADT4RqBo79_0Vx%3D-%2By%3DnFv3zdnm_-CgGzbtSv9LhxrFEoYMVFg%40mail.gmail.com
seems to indicate that it's really WaitLatchOrSocket() not noticing the
socket is closed.

For a moment I had the theory that Port->sock might be invalid because
it somehow got closed. That'd then remove the socket from the waited-on
events, which would explain the behaviour. But afaics that's really only
possible via pq_init()'s on_proc_exit(socket_close, 0); And I can't see
how that could be reached.

FWIW, the
if (sock == PGINVALID_SOCKET)
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
block in both latch implementations looks like a problem waiting to happen.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#8)

Re: Some 9.5beta2 backend processes not terminating properly?

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and

7),

but this does not occur on Ubuntu 15.10. The client runs on Windows as

well

(although I doubt that's important).

Hm. So that seems to indicate that, on windows, we're not properly
recognizing dead sockets in the latch code. Could you check, IIRC with
netstat or something like it, in what state the connections are?

netstat shows the socket is in FIN_WAIT_2.

Any chance you could single-step through WaitLatchOrSocket() with a
debugger? Without additional information this is rather hard to
diagnose.

Uh I sure can, but I have no idea what to look for :) Anything specific?

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#11)

Re: Some 9.5beta2 backend processes not terminating properly?

Andres Freund <andres@anarazel.de> writes:

FWIW, the
if (sock == PGINVALID_SOCKET)
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
block in both latch implementations looks like a problem waiting to happen.

You think it should throw an error instead? Seems reasonable to me.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#12)

Re: Some 9.5beta2 backend processes not terminating properly?

On 2015-12-30 19:38:23 +0200, Shay Rojansky wrote:

Hm. So that seems to indicate that, on windows, we're not properly
recognizing dead sockets in the latch code. Could you check, IIRC with
netstat or something like it, in what state the connections are?

netstat shows the socket is in FIN_WAIT_2.

Any chance you could single-step through WaitLatchOrSocket() with a
debugger? Without additional information this is rather hard to
diagnose.

Uh I sure can, but I have no idea what to look for :) Anything
specific?

Things that'd be interesting:
1) what are the arguments passed to WaitLatchOrSocket(), most
importantly wakeEvents and sock
2) are we busy looping, or is WaitForMultipleObjects() blocking
endlessly
3) If you kill -9 (well, terminate in the task manager) a client, while
stepping serverside in WaitLatchOrSocket, does
WaitForMultipleObjects() return? If so, what paths are we taking?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#13)

Re: Some 9.5beta2 backend processes not terminating properly?

On 2015-12-30 12:41:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

FWIW, the
if (sock == PGINVALID_SOCKET)
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
block in both latch implementations looks like a problem waiting to happen.

You think it should throw an error instead? Seems reasonable to me.

Yea. Error or maybe just an assert. That path seems to always indicate
something having gone wrong.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#14)

Re: Some 9.5beta2 backend processes not terminating properly?

Any chance you could single-step through WaitLatchOrSocket() with a
debugger? Without additional information this is rather hard to
diagnose.

Uh I sure can, but I have no idea what to look for :) Anything
specific?

Things that'd be interesting:
1) what are the arguments passed to WaitLatchOrSocket(), most
importantly wakeEvents and sock
2) are we busy looping, or is WaitForMultipleObjects() blocking
endlessly
3) If you kill -9 (well, terminate in the task manager) a client, while
stepping serverside in WaitLatchOrSocket, does
WaitForMultipleObjects() return? If so, what paths are we taking?

The process definitely isn't busy looping - zero CPU usage.

I'll try to set up debugging, it may take some time though (unfamiliar with
PostgreSQL internals and Windows debugging techniques).

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#11)

Re: Some 9.5beta2 backend processes not terminating properly?

Andres Freund <andres@anarazel.de> writes:

On 2015-12-30 12:30:43 -0500, Tom Lane wrote:

Nor OS X. Ugh. My first thought was that ac1d7945f broke this, but
that's only in HEAD not 9.5, so some earlier change must be responsible.

The backtrace in
http://archives.postgresql.org/message-id/CADT4RqBo79_0Vx%3D-%2By%3DnFv3zdnm_-CgGzbtSv9LhxrFEoYMVFg%40mail.gmail.com
seems to indicate that it's really WaitLatchOrSocket() not noticing the
socket is closed.

Right, and what I was wondering was whether adding the additional wait-for
condition had exposed some pre-existing flaw in the Windows latch code.
But that's not it, so we're left with the conclusion that we broke
something that used to work.

Are we sure this is a 9.5-only bug? Shay, can you try 9.4 branch tip
and see if it misbehaves? Can anyone else reproduce the problem?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#14)

Re: Some 9.5beta2 backend processes not terminating properly?

Things that'd be interesting:
1) what are the arguments passed to WaitLatchOrSocket(), most
importantly wakeEvents and sock

wakeEvents is 8387808 and so is sock.

Tom, this bug doesn't occur with 9.4.4 (will try to download 9.4.5 and
test).

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#17)

Re: Some 9.5beta2 backend processes not terminating properly?

On 2015-12-30 12:50:58 -0500, Tom Lane wrote:

Right, and what I was wondering was whether adding the additional wait-for
condition had exposed some pre-existing flaw in the Windows latch code.
But that's not it, so we're left with the conclusion that we broke
something that used to work.

4bad60e is another suspect. Besides wondering why I moved the FD_CLOSE
case out of the existing if cases, I don't see anything suspicious
though. If we were hitting the write-path here, it'd be plausible that
we're hitting an issue with FD_CLOSE and waiting for writability; but
we're not.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

roji@roji.org

over 10 years ago

In reply to: Tom Lane (#17)

Re: Some 9.5beta2 backend processes not terminating properly?

Are we sure this is a 9.5-only bug? Shay, can you try 9.4 branch tip
and see if it misbehaves? Can anyone else reproduce the problem?

Doesn't occur with 9.4.5 either. The first version I tested which exhibited
this was 9.5beta2.

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#18)

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#21)

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#22)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#21)

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#23)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#24)

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#25)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#23)

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Shay Rojansky (#20)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#28)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#30)

amit.kapila16@gmail.com

over 10 years ago

In reply to: Shay Rojansky (#7)

amit.kapila16@gmail.com

over 10 years ago

In reply to: Amit Kapila (#32)

roji@roji.org

over 10 years ago

In reply to: Amit Kapila (#33)

petr@2ndquadrant.com

over 10 years ago

In reply to: Shay Rojansky (#34)

andres@anarazel.de

over 10 years ago

In reply to: Petr Jelinek (#35)

petr@2ndquadrant.com

over 10 years ago

In reply to: Andres Freund (#36)

amit.kapila16@gmail.com

over 10 years ago

In reply to: Andres Freund (#36)

petr@2ndquadrant.com

over 10 years ago

In reply to: Amit Kapila (#38)

amit.kapila16@gmail.com

over 10 years ago

In reply to: Petr Jelinek (#39)

andres@anarazel.de

over 10 years ago

In reply to: Amit Kapila (#40)

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#41)

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#42)

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#43)

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#42)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#43)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#46)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#47)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#43)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#48)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#49)

lists@piening.info

over 10 years ago

In reply to: Andres Freund (#51)

andres@anarazel.de

over 10 years ago

In reply to: Brar Piening (#52)

petr@2ndquadrant.com

over 10 years ago

In reply to: Andres Freund (#53)

amit.kapila16@gmail.com

over 10 years ago

In reply to: Andres Freund (#53)

andres@anarazel.de

over 10 years ago

In reply to: Amit Kapila (#55)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#56)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#57)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#58)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#59)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#60)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#61)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#62)

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Tom Lane (#63)

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#63)

robertmhaas@gmail.com

over 10 years ago

In reply to: Magnus Hagander (#64)

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#66)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Magnus Hagander (#64)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#67)

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#69)

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#70)

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#71)