libpqrcv_PQexec() seems to violate latch protocol

Started by Andres Freundover 8 years ago6 messages
#1Andres Freund
andres@anarazel.de

Hi,

The function in $subject does:
while (PQisBusy(streamConn))
{
int rc;

/*
* We don't need to break down the sleep into smaller increments,
* since we'll get interrupted by signals and can either handle
* interrupts here or elog(FATAL) within SIGTERM signal handler if
* the signal arrives in the middle of establishment of
* replication connection.
*/
ResetLatch(&MyProc->procLatch);
rc = WaitLatchOrSocket(&MyProc->procLatch,
WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER);
if (rc & WL_POSTMASTER_DEATH)
exit(1);
/* interrupted */
if (rc & WL_LATCH_SET)
{
CHECK_FOR_INTERRUPTS();
continue;
}

Doing ResetLatch();WaitLatch() like that makes it possible to miss a the
latch being set, e.g. if it happens just after WaitLatchOrSocket()
returns.

Afaict, the ResetLatch() really should just instead be in the if (rc & WL_LATCH_SET)
block.

Unless somebody protests, I'll make it so.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: libpqrcv_PQexec() seems to violate latch protocol

Andres Freund <andres@anarazel.de> writes:

The function in $subject does:

ResetLatch(&MyProc->procLatch);
rc = WaitLatchOrSocket(&MyProc->procLatch,
WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER);

Yeah, this is certainly broken.

Afaict, the ResetLatch() really should just instead be in the if (rc & WL_LATCH_SET) block.

And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
since that is the useful work that we want to be sure occurs after
any latch-setting event.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: libpqrcv_PQexec() seems to violate latch protocol

On 2017-06-06 17:14:59 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

The function in $subject does:

ResetLatch(&MyProc->procLatch);
rc = WaitLatchOrSocket(&MyProc->procLatch,
WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER);

Yeah, this is certainly broken.

Afaict, the ResetLatch() really should just instead be in the if (rc & WL_LATCH_SET) block.

And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
since that is the useful work that we want to be sure occurs after
any latch-setting event.

Right. I found a couple more instance of similarly iffy, although not
quite as broken, patterns in launcher.c. It's easy to get this wrong,
but it's a lot easy if you do it differently everywhere you use a
latch. It's not good if code in the same file, by the same author(s),
has different ways of using latches.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#3)
Re: libpqrcv_PQexec() seems to violate latch protocol

On 06/06/17 23:17, Andres Freund wrote:

On 2017-06-06 17:14:59 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

The function in $subject does:

ResetLatch(&MyProc->procLatch);
rc = WaitLatchOrSocket(&MyProc->procLatch,
WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER);

Yeah, this is certainly broken.

Afaict, the ResetLatch() really should just instead be in the if (rc & WL_LATCH_SET) block.

And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
since that is the useful work that we want to be sure occurs after
any latch-setting event.

Right. I found a couple more instance of similarly iffy, although not
quite as broken, patterns in launcher.c. It's easy to get this wrong,
but it's a lot easy if you do it differently everywhere you use a
latch. It's not good if code in the same file, by the same author(s),
has different ways of using latches.

Huh? I see same pattern everywhere in launcher.c, what am I missing?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#4)
Re: libpqrcv_PQexec() seems to violate latch protocol

On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:

On 06/06/17 23:17, Andres Freund wrote:

Right. I found a couple more instance of similarly iffy, although not
quite as broken, patterns in launcher.c. It's easy to get this wrong,
but it's a lot easy if you do it differently everywhere you use a
latch. It's not good if code in the same file, by the same author(s),
has different ways of using latches.

Huh? I see same pattern everywhere in launcher.c, what am I missing?

WaitForReplicationWorkerAttach:
while (...)
CHECK_FOR_INTERRUPTS();
/* other stuff including returns */
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

logicalrep_worker_stop loop 1:
while (...)
/* other stuff */
CHECK_FOR_INTERRUPTS()
WaitLatch()
POSTMASTER_DEATH
ResetLatch()
/* other stuff including returns */
logicalrep_worker_stop loop 1:
while (...)
/* other stuff including returns */
CHECK_FOR_INTERRUPTS();
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

ApplyLauncherMain:
while (!got_SIGTERM)
/* lots other stuff */
WaitLatch()
WL_POSTMASTER_DEATH
/* some other stuff */
ResetLatch()
(note no CFI)

they're not hugely different, but subtely there are differences.
Sometimes you're guaranteed to check for interrupts after resetting the
latch, in other cases not. Sometimes expensive-ish things happen before
a CFI...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: libpqrcv_PQexec() seems to violate latch protocol

On 06/06/17 23:42, Andres Freund wrote:

On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:

On 06/06/17 23:17, Andres Freund wrote:

Right. I found a couple more instance of similarly iffy, although not
quite as broken, patterns in launcher.c. It's easy to get this wrong,
but it's a lot easy if you do it differently everywhere you use a
latch. It's not good if code in the same file, by the same author(s),
has different ways of using latches.

Huh? I see same pattern everywhere in launcher.c, what am I missing?

WaitForReplicationWorkerAttach:
while (...)
CHECK_FOR_INTERRUPTS();
/* other stuff including returns */
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

logicalrep_worker_stop loop 1:
while (...)
/* other stuff */
CHECK_FOR_INTERRUPTS()
WaitLatch()
POSTMASTER_DEATH
ResetLatch()
/* other stuff including returns */
logicalrep_worker_stop loop 1:
while (...)
/* other stuff including returns */
CHECK_FOR_INTERRUPTS();
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

ApplyLauncherMain:
while (!got_SIGTERM)
/* lots other stuff */
WaitLatch()
WL_POSTMASTER_DEATH
/* some other stuff */
ResetLatch()
(note no CFI)

they're not hugely different, but subtely there are differences.
Sometimes you're guaranteed to check for interrupts after resetting the
latch, in other cases not. Sometimes expensive-ish things happen before
a CFI...

Ah that's because signals in launcher are broken, see
/messages/by-id/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
which also includes patch to fix it.

We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers