Why is src/test/modules/committs/t/002_standby.pl flaky?
Hi,
There's a wait for replay that is open coded (instead of using the
wait_for_catchup() routine), and sometimes the second of two such
waits at line 51 (in master) times out after 3 minutes with "standby
never caught up". It's happening on three particular Windows boxes,
but once also happened on the AIX box "tern".
branch | animal | count
---------------+-----------+-------
HEAD | drongo | 1
HEAD | fairywren | 8
REL_10_STABLE | drongo | 3
REL_10_STABLE | fairywren | 10
REL_10_STABLE | jacana | 3
REL_11_STABLE | drongo | 1
REL_11_STABLE | fairywren | 4
REL_11_STABLE | jacana | 3
REL_12_STABLE | drongo | 2
REL_12_STABLE | fairywren | 5
REL_12_STABLE | jacana | 1
REL_12_STABLE | tern | 1
REL_13_STABLE | fairywren | 3
REL_14_STABLE | drongo | 2
REL_14_STABLE | fairywren | 6
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-30%2014:42:30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-30%2013:13:22
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-30%2006:03:07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-22%2011:37:37
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-22%2010:46:07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-22%2009:03:06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-17%2004:59:17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-17%2003:59:51
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-16%2004:37:58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-15%2009:57:14
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-15%2002:38:43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-14%2020:42:15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-14%2012:08:41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-14%2000:35:32
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-13%2023:40:11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-13%2022:47:25
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-09%2006:59:10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-09%2006:04:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-09%2001:36:09
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-08%2019:20:35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-08%2018:04:28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-08%2014:12:32
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-12-08%2011:15:58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-12-08%2004:04:22
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-12-03%2017:31:49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-11-11%2015:58:55
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-10-02%2022:00:17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-09-09%2005:16:43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-08-24%2004:45:09
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-17%2010:57:49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-12%2016:05:32
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-02-07%2012:59:43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-03-24%2012:49:50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-02-01%2018:00:27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-02-01%2017:26:27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-01-30%2023:49:49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-22%2014:19:02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-12-13%2000:12:11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-12-09%2006:02:05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-12-06%2003:07:42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-11-02%2014:41:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-25%2013:12:08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-24%2013:12:41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-23%2023:10:00
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-23%2018:00:39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-22%2015:05:57
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-18%2013:29:49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-16%2014:54:46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-15%2014:21:11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-14%2013:15:07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-13%2014:19:41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2019-10-12%2016:32:06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2019-10-10%2013:12:09
On 12/30/21 15:01, Thomas Munro wrote:
Hi,
There's a wait for replay that is open coded (instead of using the
wait_for_catchup() routine), and sometimes the second of two such
waits at line 51 (in master) times out after 3 minutes with "standby
never caught up". It's happening on three particular Windows boxes,
but once also happened on the AIX box "tern".branch | animal | count
---------------+-----------+-------
HEAD | drongo | 1
HEAD | fairywren | 8
REL_10_STABLE | drongo | 3
REL_10_STABLE | fairywren | 10
REL_10_STABLE | jacana | 3
REL_11_STABLE | drongo | 1
REL_11_STABLE | fairywren | 4
REL_11_STABLE | jacana | 3
REL_12_STABLE | drongo | 2
REL_12_STABLE | fairywren | 5
REL_12_STABLE | jacana | 1
REL_12_STABLE | tern | 1
REL_13_STABLE | fairywren | 3
REL_14_STABLE | drongo | 2
REL_14_STABLE | fairywren | 6
FYI, drongo and fairywren are run on the same AWS/EC2 Windows Server
2019 instance. Nothing else runs on it.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
On 12/30/21 15:01, Thomas Munro wrote:
There's a wait for replay that is open coded (instead of using the
wait_for_catchup() routine), and sometimes the second of two such
waits at line 51 (in master) times out after 3 minutes with "standby
never caught up". It's happening on three particular Windows boxes,
but once also happened on the AIX box "tern".
FYI, drongo and fairywren are run on the same AWS/EC2 Windows Server
2019 instance. Nothing else runs on it.
I spent a little time looking into this just now. There are similar
failures in both 002_standby.pl and 003_standby_2.pl, which is
unsurprising because there are essentially-identical test sequences
in both. What I've realized is that the issue is triggered by
this sequence:
$standby->start;
...
$primary->restart;
$primary->safe_psql('postgres', 'checkpoint');
my $primary_lsn =
$primary->safe_psql('postgres', 'select pg_current_wal_lsn()');
$standby->poll_query_until('postgres',
qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
or die "standby never caught up";
(the failing poll_query_until is at line 51 in 002_standby.pl, or
line 37 in 003_standby_2.pl). That is, we have forced a primary
restart since the standby first connected to the primary, and
now we have to wait for the standby to reconnect and catch up.
*These two tests seem to be the only TAP tests that do that*.
So I think there's not really anything specific to commit_ts testing
involved, it's just a dearth of primary restarts elsewhere.
Looking at the logs in the failing cases, there's no evidence
that the standby has even detected the primary's disconnection,
which explains why it hasn't attempted to reconnect. For
example, in the most recent HEAD failure,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-01-03%2018%3A04%3A41
the standby reports successful connection:
2022-01-03 18:58:04.920 UTC [179700:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
(which we can also see in the primary's log), but after that
there's no log traffic at all except the test script's vain
checks of pg_last_wal_replay_lsn(). In the same animal's
immediately preceding successful run,
we see:
2022-01-03 15:59:24.186 UTC [176664:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2022-01-03 15:59:25.003 UTC [176664:2] LOG: replication terminated by primary server
2022-01-03 15:59:25.003 UTC [176664:3] DETAIL: End of WAL reached on timeline 1 at 0/3030CB8.
2022-01-03 15:59:25.003 UTC [176664:4] FATAL: could not send end-of-streaming message to primary: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no COPY in progress
2022-01-03 15:59:25.005 UTC [177092:5] LOG: invalid record length at 0/3030CB8: wanted 24, got 0
...
2022-01-03 15:59:25.564 UTC [177580:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
So for some reason, on these machines detection of walsender-initiated
connection close is unreliable ... or maybe, the walsender didn't close
the connection, but is somehow still hanging around? Don't have much idea
where to dig beyond that, but maybe someone else will. I wonder in
particular if this could be related to our recent discussions about
whether to use shutdown(2) on Windows --- could we need to do the
equivalent of 6051857fc/ed52c3707 on walsender connections?
regards, tom lane
I wrote:
So for some reason, on these machines detection of walsender-initiated
connection close is unreliable ... or maybe, the walsender didn't close
the connection, but is somehow still hanging around? Don't have much idea
where to dig beyond that, but maybe someone else will. I wonder in
particular if this could be related to our recent discussions about
whether to use shutdown(2) on Windows --- could we need to do the
equivalent of 6051857fc/ed52c3707 on walsender connections?
... wait a minute. After some more study of the buildfarm logs,
it was brought home to me that these failures started happening
just after 6051857fc went in:
The oldest matching failure is jacana's on 2021-12-03.
(The above sweep finds an unrelated-looking failure on 2021-11-11,
but no others before 6051857fc went in on 2021-12-02. Also, it
looks likely that ed52c3707 on 2021-12-07 made the failure more
probable, because jacana's is the only matching failure before 12-07.)
So I'm now thinking it's highly likely that those commits are
causing it somehow, but how?
regards, tom lane
Hello Tom,
09.01.2022 04:17, Tom Lane wrote:
So for some reason, on these machines detection of walsender-initiated
connection close is unreliable ... or maybe, the walsender didn't close
the connection, but is somehow still hanging around? Don't have much idea
where to dig beyond that, but maybe someone else will. I wonder in
particular if this could be related to our recent discussions about
whether to use shutdown(2) on Windows --- could we need to do the
equivalent of 6051857fc/ed52c3707 on walsender connections?... wait a minute. After some more study of the buildfarm logs,
it was brought home to me that these failures started happening
just after 6051857fc went in:The oldest matching failure is jacana's on 2021-12-03.
(The above sweep finds an unrelated-looking failure on 2021-11-11,
but no others before 6051857fc went in on 2021-12-02. Also, it
looks likely that ed52c3707 on 2021-12-07 made the failure more
probable, because jacana's is the only matching failure before 12-07.)So I'm now thinking it's highly likely that those commits are
causing it somehow, but how?
I've managed to reproduce this failure too.
Removing "shutdown(MyProcPort->sock, SD_SEND);" doesn't help here, so
the culprit is exactly "closesocket(MyProcPort->sock);".
I've added `system("netstat -ano");` before die() in 002_standby.pl and see:
# Postmaster PID for node "primary" is 944
Proto Local Address Foreign Address State PID
...
TCP 127.0.0.1:58545 127.0.0.1:61995 FIN_WAIT_2 944
...
TCP 127.0.0.1:61995 127.0.0.1:58545 CLOSE_WAIT 1352
(Replacing SD_SEND with SD_BOTH doesn't change the behaviour.)
Looking at the libpqwalreceiver.c:
/* Now that we've consumed some input, try again */
rawlen = PQgetCopyData(conn->streamConn, &conn->recvBuf, 1);
here we get -1 on the primary disconnection.
Then we get COMMAND_OK here:
res = libpqrcv_PQgetResult(conn->streamConn);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
and finally just hang at:
/* Verify that there are no more results. */
res = libpqrcv_PQgetResult(conn->streamConn);
until the standby gets interrupted by the TAP test. (That call can also
return NULL and then the test completes successfully.)
Going down through the call chain, I see that at the end of it
WaitForMultipleObjects() hangs while waiting for the primary connection
socket event. So it looks like the socket, that is closed by the
primary, can get into a state unsuitable for WaitForMultipleObjects().
I tried to check the socket state with the WSAPoll() function and
discovered that it returns POLLHUP for the "problematic" socket.
The following draft addition in latch.c:
int
WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
long timeout, uint32 wait_event_info)
{
int ret = 0;
int rc;
WaitEvent event;
#ifdef WIN32
if (wakeEvents & WL_SOCKET_MASK) {
WSAPOLLFD pollfd;
pollfd.fd = sock;
pollfd.events = POLLRDNORM | POLLWRNORM;
pollfd.revents = 0;
int rc = WSAPoll(&pollfd, 1, 0);
if ((rc == 1) && (pollfd.revents & POLLHUP)) {
elog(WARNING, "WaitLatchOrSocket: A stream-oriented
connection was either disconnected or aborted.");
return WL_SOCKET_MASK;
}
}
#endif
makes the test 002_stanby.pl pass (100 of 100 iterations, while without
the fix I get failures roughly on each third). I'm not sure where to
place this check, maybe it's better to move it up to
libpqrcv_PQgetResult() to minimize it's footprint or to find less
Windows-specific approach, but I'd prefer a client-side fix anyway, as
graceful closing a socket by a server seems a legitimate action.
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes:
09.01.2022 04:17, Tom Lane wrote:
... wait a minute. After some more study of the buildfarm logs,
it was brought home to me that these failures started happening
just after 6051857fc went in:
I've managed to reproduce this failure too.
Removing "shutdown(MyProcPort->sock, SD_SEND);" doesn't help here, so
the culprit is exactly "closesocket(MyProcPort->sock);".
Ugh. Did you try removing the closesocket and keeping shutdown?
I don't recall if we tried that combination before.
... I'm not sure where to
place this check, maybe it's better to move it up to
libpqrcv_PQgetResult() to minimize it's footprint or to find less
Windows-specific approach, but I'd prefer a client-side fix anyway, as
graceful closing a socket by a server seems a legitimate action.
What concerns me here is whether this implies that other clients
(libpq, jdbc, etc) are going to need changes as well. Maybe
libpq is okay, because we've not seen failures of the isolation
tests that use pg_cancel_backend(), but still it's worrisome.
I'm not entirely sure whether the isolationtester would notice
that a connection that should have died didn't.
regards, tom lane
On Mon, Jan 10, 2022 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
Going down through the call chain, I see that at the end of it
WaitForMultipleObjects() hangs while waiting for the primary connection
socket event. So it looks like the socket, that is closed by the
primary, can get into a state unsuitable for WaitForMultipleObjects().
I wonder if FD_CLOSE is edge-triggered, and it's already told us once.
I think that's what these Python Twisted guys are saying:
I tried to check the socket state with the WSAPoll() function and
discovered that it returns POLLHUP for the "problematic" socket.
Good discovery. I guess if the above theory is right, there's a
memory somewhere that makes this level-triggered as expected by users
of poll().
On Mon, Jan 10, 2022 at 8:06 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Jan 10, 2022 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
Going down through the call chain, I see that at the end of it
WaitForMultipleObjects() hangs while waiting for the primary connection
socket event. So it looks like the socket, that is closed by the
primary, can get into a state unsuitable for WaitForMultipleObjects().I wonder if FD_CLOSE is edge-triggered, and it's already told us once.
Can you reproduce it with this patch?
Attachments:
0001-Make-Windows-FD_CLOSE-reporting-sticky.patchtext/x-patch; charset=US-ASCII; name=0001-Make-Windows-FD_CLOSE-reporting-sticky.patchDownload
From d3cfff53911ce068d4a31fe7ac7933958936d1b0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 10 Jan 2022 14:45:05 +1300
Subject: [PATCH] Make Windows FD_CLOSE reporting sticky.
XXX Just testing an idea...
---
src/backend/storage/ipc/latch.c | 16 ++++++++++++++++
src/include/storage/latch.h | 1 +
2 files changed, 17 insertions(+)
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 1d893cf863..4a61b31cd5 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -899,6 +899,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
event->user_data = user_data;
#ifdef WIN32
event->reset = false;
+ event->closed = false;
#endif
if (events == WL_LATCH_SET)
@@ -1882,6 +1883,20 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
return 1;
}
}
+
+ /*
+ * XXX: HYPOTHESIS TO TEST
+ * Windows only reports FD_CLOSE once. If we've seen that already,
+ * continue to report it.
+ */
+ if ((cur_event & WL_SOCKET_MASK) && cur_event->closed)
+ {
+ occurred_events->pos = cur_event->pos;
+ occurred_events->user_data = cur_event->user_data;
+ occurred_events->events = (cur_event->events & WL_SOCKET_MASK);
+ occurred_events->fd = cur_event->fd;
+ return 1;
+ }
}
/*
@@ -2002,6 +2017,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
{
/* EOF/error, so signal all caller-requested socket flags */
occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
+ cur_event->closed = true;
}
if (occurred_events->events != 0)
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 44f9368c64..c24f46dc37 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -147,6 +147,7 @@ typedef struct WaitEvent
void *user_data; /* pointer provided in AddWaitEventToSet */
#ifdef WIN32
bool reset; /* Is reset of the event required? */
+ bool closed; /* Has FD_CLOSE event been reported? */
#endif
} WaitEvent;
--
2.33.1
10.01.2022 05:00, Thomas Munro wrote:
On Mon, Jan 10, 2022 at 8:06 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Jan 10, 2022 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
Going down through the call chain, I see that at the end of it
WaitForMultipleObjects() hangs while waiting for the primary connection
socket event. So it looks like the socket, that is closed by the
primary, can get into a state unsuitable for WaitForMultipleObjects().I wonder if FD_CLOSE is edge-triggered, and it's already told us once.
Can you reproduce it with this patch?
Unfortunately, this fix (with the correction "(cur_event &
WL_SOCKET_MASK)" -> "(cur_event->events & WL_SOCKET_MASK") doesn't work,
because we have two separate calls to libpqrcv_PQgetResult():
Then we get COMMAND_OK here:
res = libpqrcv_PQgetResult(conn->streamConn);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
and finally just hang at:
/* Verify that there are no more results. */
res = libpqrcv_PQgetResult(conn->streamConn);
The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
where WaitEvents are defined locally, and the closed flag set on the
first invocation but expected to be checked on second.
I've managed to reproduce this failure too.
Removing "shutdown(MyProcPort->sock, SD_SEND);" doesn't help here, so
the culprit is exactly "closesocket(MyProcPort->sock);".Ugh. Did you try removing the closesocket and keeping shutdown?
I don't recall if we tried that combination before.
Even with shutdown() only I still observe WaitForMultipleObjects()
hanging (and WSAPoll() returns POLLHUP for the socket).
As to your concern regarding other clients, I suspect that this issue is
caused by libpqwalreceiver' specific call pattern and may be other
clients just don't do that. I need some more time to analyze this.
Best regards,
Alexander
On Mon, Jan 10, 2022 at 8:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
where WaitEvents are defined locally, and the closed flag set on the
first invocation but expected to be checked on second.
D'oh, right. There's also a WaitLatchOrSocket call in walreceiver.c.
We'd need a long-lived WaitEventSet common across all of these sites,
which is hard here (because the socket might change under you, as
discussed in other threads that introduced long lived WaitEventSets to
other places but not here).
/me wonders if it's possible that graceful FD_CLOSE is reported only
once, but abortive/error FD_CLOSE is reported multiple times...
On Mon, Jan 10, 2022 at 10:20 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Jan 10, 2022 at 8:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
where WaitEvents are defined locally, and the closed flag set on the
first invocation but expected to be checked on second.D'oh, right. There's also a WaitLatchOrSocket call in walreceiver.c.
We'd need a long-lived WaitEventSet common across all of these sites,
which is hard here (because the socket might change under you, as
discussed in other threads that introduced long lived WaitEventSets to
other places but not here).
This is super quick-and-dirty code (and doesn't handle some errors or
socket changes correctly), but does it detect the closed socket?
Attachments:
v2-0001-Make-Windows-FD_CLOSE-reporting-sticky.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Make-Windows-FD_CLOSE-reporting-sticky.patchDownload
From 3590a8c9b3e8992a65dea7a9d6aecdca2f25582d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 10 Jan 2022 14:45:05 +1300
Subject: [PATCH v2 1/2] Make Windows FD_CLOSE reporting sticky.
XXX Just testing an idea...
---
src/backend/storage/ipc/latch.c | 16 ++++++++++++++++
src/include/storage/latch.h | 1 +
2 files changed, 17 insertions(+)
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 1d893cf863..8f3176a00f 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -899,6 +899,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
event->user_data = user_data;
#ifdef WIN32
event->reset = false;
+ event->closed = false;
#endif
if (events == WL_LATCH_SET)
@@ -1882,6 +1883,20 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
return 1;
}
}
+
+ /*
+ * XXX: HYPOTHESIS TO TEST
+ * Windows only reports FD_CLOSE once. If we've seen that already,
+ * continue to report it.
+ */
+ if ((cur_event->events & WL_SOCKET_MASK) && cur_event->closed)
+ {
+ occurred_events->pos = cur_event->pos;
+ occurred_events->user_data = cur_event->user_data;
+ occurred_events->events = (cur_event->events & WL_SOCKET_MASK);
+ occurred_events->fd = cur_event->fd;
+ return 1;
+ }
}
/*
@@ -2002,6 +2017,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
{
/* EOF/error, so signal all caller-requested socket flags */
occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
+ cur_event->closed = true;
}
if (occurred_events->events != 0)
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 44f9368c64..c24f46dc37 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -147,6 +147,7 @@ typedef struct WaitEvent
void *user_data; /* pointer provided in AddWaitEventToSet */
#ifdef WIN32
bool reset; /* Is reset of the event required? */
+ bool closed; /* Has FD_CLOSE event been reported? */
#endif
} WaitEvent;
--
2.33.1
v2-0002-XXX-Quick-hack-to-use-persistent-WaitEventSet-in-.patchtext/x-patch; charset=US-ASCII; name=v2-0002-XXX-Quick-hack-to-use-persistent-WaitEventSet-in-.patchDownload
From 755ee3da7934d33a706d59eb0abc0e4f1829acb0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 10 Jan 2022 22:02:21 +1300
Subject: [PATCH v2 2/2] XXX Quick hack to use persistent WaitEventSet in
walreceiver
XXX There are several things wrong with this, it's just experimental code.
---
.../libpqwalreceiver/libpqwalreceiver.c | 97 ++++++++++++-------
src/backend/replication/walreceiver.c | 12 +--
src/include/replication/walreceiver.h | 8 ++
3 files changed, 75 insertions(+), 42 deletions(-)
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c08e599eef..f8dd76e2d0 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -40,6 +40,7 @@ void _PG_init(void);
struct WalReceiverConn
{
+ WaitEventSet *wes;
/* Current connection to the primary, if any */
PGconn *streamConn;
/* Used to remember if the connection is logical or physical */
@@ -82,6 +83,7 @@ static WalRcvExecResult *libpqrcv_exec(WalReceiverConn *conn,
const int nRetTypes,
const Oid *retTypes);
static void libpqrcv_disconnect(WalReceiverConn *conn);
+static int libpqrcv_wait(WalReceiverConn *conn, int timeout, int wait_event);
static WalReceiverFunctionsType PQWalReceiverFunctions = {
libpqrcv_connect,
@@ -98,12 +100,13 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
libpqrcv_create_slot,
libpqrcv_get_backend_pid,
libpqrcv_exec,
- libpqrcv_disconnect
+ libpqrcv_disconnect,
+ libpqrcv_wait
};
/* Prototypes for private functions */
-static PGresult *libpqrcv_PQexec(PGconn *streamConn, const char *query);
-static PGresult *libpqrcv_PQgetResult(PGconn *streamConn);
+static PGresult *libpqrcv_PQexec(WalReceiverConn *conn, const char *query);
+static PGresult *libpqrcv_PQgetResult(WalReceiverConn *conn);
static char *stringlist_to_identifierstr(PGconn *conn, List *strings);
/*
@@ -182,6 +185,15 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
return NULL;
}
+ conn->wes = CreateWaitEventSet(TopMemoryContext, 3);
+ AddWaitEventToSet(conn->wes, WL_SOCKET_WRITEABLE,
+ PQsocket(conn->streamConn), NULL, NULL);
+ AddWaitEventToSet(conn->wes, WL_LATCH_SET, PGINVALID_SOCKET,
+ MyLatch, NULL);
+ AddWaitEventToSet(conn->wes, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+ NULL, NULL);
+
+
/*
* Poll connection until we have OK or FAILED status.
*
@@ -191,7 +203,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
do
{
int io_flag;
- int rc;
+ WaitEvent event;
if (status == PGRES_POLLING_READING)
io_flag = WL_SOCKET_READABLE;
@@ -203,21 +215,20 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
else
io_flag = WL_SOCKET_WRITEABLE;
- rc = WaitLatchOrSocket(MyLatch,
- WL_EXIT_ON_PM_DEATH | WL_LATCH_SET | io_flag,
- PQsocket(conn->streamConn),
- 0,
- WAIT_EVENT_LIBPQWALRECEIVER_CONNECT);
+ ModifyWaitEvent(conn->wes, 0, io_flag, NULL);
+ if (WaitEventSetWait(conn->wes, -1, &event, 1,
+ WAIT_EVENT_LIBPQWALRECEIVER_CONNECT) < 1)
+ continue;
/* Interrupted? */
- if (rc & WL_LATCH_SET)
+ if (event.events & WL_LATCH_SET)
{
ResetLatch(MyLatch);
ProcessWalRcvInterrupts();
}
/* If socket is ready, advance the libpq state machine */
- if (rc & io_flag)
+ if (event.events & io_flag)
status = PQconnectPoll(conn->streamConn);
} while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED);
@@ -231,7 +242,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
{
PGresult *res;
- res = libpqrcv_PQexec(conn->streamConn,
+ res = libpqrcv_PQexec(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
@@ -359,7 +370,7 @@ libpqrcv_identify_system(WalReceiverConn *conn, TimeLineID *primary_tli)
* Get the system identifier and timeline ID as a DataRow message from the
* primary server.
*/
- res = libpqrcv_PQexec(conn->streamConn, "IDENTIFY_SYSTEM");
+ res = libpqrcv_PQexec(conn, "IDENTIFY_SYSTEM");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
@@ -482,7 +493,7 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
options->proto.physical.startpointTLI);
/* Start streaming. */
- res = libpqrcv_PQexec(conn->streamConn, cmd.data);
+ res = libpqrcv_PQexec(conn, cmd.data);
pfree(cmd.data);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
@@ -532,7 +543,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
* If we had not yet received CopyDone from the backend, PGRES_COPY_OUT is
* also possible in case we aborted the copy in mid-stream.
*/
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
if (PQresultStatus(res) == PGRES_TUPLES_OK)
{
/*
@@ -547,7 +558,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
PQclear(res);
/* the result set should be followed by CommandComplete */
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
}
else if (PQresultStatus(res) == PGRES_COPY_OUT)
{
@@ -561,7 +572,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
pchomp(PQerrorMessage(conn->streamConn)))));
/* CommandComplete should follow */
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
}
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -572,7 +583,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
PQclear(res);
/* Verify that there are no more results */
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
if (res != NULL)
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -597,7 +608,7 @@ libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
* Request the primary to send over the history file for given timeline.
*/
snprintf(cmd, sizeof(cmd), "TIMELINE_HISTORY %u", tli);
- res = libpqrcv_PQexec(conn->streamConn, cmd);
+ res = libpqrcv_PQexec(conn, cmd);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
@@ -641,8 +652,9 @@ libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
* May return NULL, rather than an error result, on failure.
*/
static PGresult *
-libpqrcv_PQexec(PGconn *streamConn, const char *query)
+libpqrcv_PQexec(WalReceiverConn *conn, const char *query)
{
+ PGconn *streamConn = conn->streamConn;
PGresult *lastResult = NULL;
/*
@@ -665,7 +677,7 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query)
/* Wait for, and collect, the next PGresult. */
PGresult *result;
- result = libpqrcv_PQgetResult(streamConn);
+ result = libpqrcv_PQgetResult(conn);
if (result == NULL)
break; /* query is complete, or failure */
@@ -690,30 +702,31 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query)
* Perform the equivalent of PQgetResult(), but watch for interrupts.
*/
static PGresult *
-libpqrcv_PQgetResult(PGconn *streamConn)
+libpqrcv_PQgetResult(WalReceiverConn *conn)
{
+ PGconn *streamConn = conn->streamConn;
+
/*
* Collect data until PQgetResult is ready to get the result without
* blocking.
*/
while (PQisBusy(streamConn))
{
- int rc;
+ WaitEvent event;
/*
* We don't need to break down the sleep into smaller increments,
* since we'll get interrupted by signals and can handle any
* interrupts here.
*/
- rc = WaitLatchOrSocket(MyLatch,
- WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
- WL_LATCH_SET,
- PQsocket(streamConn),
- 0,
- WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);
+ ModifyWaitEvent(conn->wes, 0, WL_SOCKET_READABLE, NULL);
+
+ if (WaitEventSetWait(conn->wes, -1, &event, 1,
+ WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE) < 1)
+ continue;
/* Interrupted? */
- if (rc & WL_LATCH_SET)
+ if (event.events & WL_LATCH_SET)
{
ResetLatch(MyLatch);
ProcessWalRcvInterrupts();
@@ -737,12 +750,28 @@ libpqrcv_PQgetResult(PGconn *streamConn)
static void
libpqrcv_disconnect(WalReceiverConn *conn)
{
+ FreeWaitEventSet(conn->wes);
PQfinish(conn->streamConn);
if (conn->recvBuf != NULL)
PQfreemem(conn->recvBuf);
pfree(conn);
}
+/*
+ * Wait for data, latch, or timeout.
+ */
+static int
+libpqrcv_wait(WalReceiverConn *conn, int timeout, int wait_event)
+{
+ WaitEvent event;
+
+ if (WaitEventSetWait(conn->wes, timeout, &event, 1, wait_event) == 1)
+ return event.events;
+
+ return 0;
+}
+
+
/*
* Receive a message available from XLOG stream.
*
@@ -793,13 +822,13 @@ libpqrcv_receive(WalReceiverConn *conn, char **buffer,
{
PGresult *res;
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
{
PQclear(res);
/* Verify that there are no more results. */
- res = libpqrcv_PQgetResult(conn->streamConn);
+ res = libpqrcv_PQgetResult(conn);
if (res != NULL)
{
PQclear(res);
@@ -941,7 +970,7 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
}
- res = libpqrcv_PQexec(conn->streamConn, cmd.data);
+ res = libpqrcv_PQexec(conn, cmd.data);
pfree(cmd.data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
@@ -1068,7 +1097,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the query interface requires a database connection")));
- pgres = libpqrcv_PQexec(conn->streamConn, query);
+ pgres = libpqrcv_PQexec(conn, query);
switch (PQresultStatus(pgres))
{
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7a7eb3784e..f308ea9c24 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -507,13 +507,9 @@ WalReceiverMain(void)
* could add and remove just the socket each time, potentially
* avoiding some system calls.
*/
- Assert(wait_fd != PGINVALID_SOCKET);
- rc = WaitLatchOrSocket(MyLatch,
- WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
- WL_TIMEOUT | WL_LATCH_SET,
- wait_fd,
- NAPTIME_PER_CYCLE,
- WAIT_EVENT_WAL_RECEIVER_MAIN);
+ //Assert(wait_fd != PGINVALID_SOCKET);
+ rc = walrcv_wait(wrconn, NAPTIME_PER_CYCLE,
+ WAIT_EVENT_WAL_RECEIVER_MAIN);
if (rc & WL_LATCH_SET)
{
ResetLatch(MyLatch);
@@ -532,7 +528,7 @@ WalReceiverMain(void)
XLogWalRcvSendReply(true, false);
}
}
- if (rc & WL_TIMEOUT)
+ if (rc == 0)
{
/*
* We didn't receive anything new. If we haven't heard
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 0b607ed777..3e80ff63ce 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -380,6 +380,11 @@ typedef WalRcvExecResult *(*walrcv_exec_fn) (WalReceiverConn *conn,
*/
typedef void (*walrcv_disconnect_fn) (WalReceiverConn *conn);
+/*
+ * XXX Wait for readable data, timeout, or latch.
+ */
+typedef int (*walrcv_wait_fn) (WalReceiverConn *conn, int timeout, int wait_event);
+
typedef struct WalReceiverFunctionsType
{
walrcv_connect_fn walrcv_connect;
@@ -397,6 +402,7 @@ typedef struct WalReceiverFunctionsType
walrcv_get_backend_pid_fn walrcv_get_backend_pid;
walrcv_exec_fn walrcv_exec;
walrcv_disconnect_fn walrcv_disconnect;
+ walrcv_wait_fn walrcv_wait;
} WalReceiverFunctionsType;
extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
@@ -431,6 +437,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
WalReceiverFunctions->walrcv_exec(conn, exec, nRetTypes, retTypes)
#define walrcv_disconnect(conn) \
WalReceiverFunctions->walrcv_disconnect(conn)
+#define walrcv_wait(conn, timeout, wait_event) \
+ WalReceiverFunctions->walrcv_wait(conn, timeout, wait_event)
static inline void
walrcv_clear_result(WalRcvExecResult *walres)
--
2.33.1
10.01.2022 12:40, Thomas Munro wrote:
This is super quick-and-dirty code (and doesn't handle some errors or
socket changes correctly), but does it detect the closed socket?
Yes, it fixes the behaviour and makes the 002_standby test pass (100 of
100 iterations). I'm yet to find out whether the other
WaitLatchOrSocket' users (e. g. postgres_fdw) can suffer from the
disconnected socket state, but this approach definitely works for
walreceiver.
Best regards,
Alexander
On Tue, Jan 11, 2022 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
10.01.2022 12:40, Thomas Munro wrote:
This is super quick-and-dirty code (and doesn't handle some errors or
socket changes correctly), but does it detect the closed socket?
Yes, it fixes the behaviour and makes the 002_standby test pass (100 of
100 iterations).
Thanks for testing. That result does seem to confirm the hypothesis
that FD_CLOSE is reported only once for the socket on graceful
shutdown (that is, it's edge-triggered and incidentally you won't get
FD_READ), so you need to keep track of it carefully. Incidentally,
another observation is that your WSAPoll() test appears to be
returning POLLHUP where at least Linux, FreeBSD and Solaris would not:
a socket that is only half shut down (the primary shut down its end
gracefully, but walreceiver did not), so I suspect Windows' POLLHUP
might have POLLRDHUP semantics.
I'm yet to find out whether the other
WaitLatchOrSocket' users (e. g. postgres_fdw) can suffer from the
disconnected socket state, but this approach definitely works for
walreceiver.
I see where you're going: there might be safe call sequences and
unsafe call sequences, and maybe walreceiver is asking for trouble by
double-polling. I'm not sure about that; I got the impression
recently that it's possible to get FD_CLOSE while you still have
buffered data to read, so then the next recv() will return > 0 and
then we don't have any state left anywhere to remember that we saw
FD_CLOSE, even if you're careful to poll and read in the ideal
sequence. I could be wrong, and it would be nice if there is an easy
fix along those lines... The documentation around FD_CLOSE is
unclear.
I do plan to make a higher quality patch like the one I showed
(material from earlier unfinished work[1]/messages/by-id/CA+hUKGJPaygh-6WHEd0FnH89GrkTpVyN_ew9ckv3+nwjmLcSeg@mail.gmail.com that needs a bit more
infrastructure), but to me that's new feature/efficiency work, not
something we'd want to back-patch.
Hmm, one thing I'm still unclear on: did this problem really start
with 6051857fc/ed52c3707? My initial email in this thread lists
similar failures going back further, doesn't it? (And what's tern
doing mixed up in this mess?)
[1]: /messages/by-id/CA+hUKGJPaygh-6WHEd0FnH89GrkTpVyN_ew9ckv3+nwjmLcSeg@mail.gmail.com
On 1/10/22 15:52, Thomas Munro wrote:
Hmm, one thing I'm still unclear on: did this problem really start
with 6051857fc/ed52c3707? My initial email in this thread lists
similar failures going back further, doesn't it? (And what's tern
doing mixed up in this mess?)
Your list contains at least some false positives. e.g.
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-22%2014:19:02>
which has a different script failing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes:
Hmm, one thing I'm still unclear on: did this problem really start
with 6051857fc/ed52c3707? My initial email in this thread lists
similar failures going back further, doesn't it? (And what's tern
doing mixed up in this mess?)
Well, those earlier ones may be committs failures, but a lot of
them contain different-looking symptoms, eg pg_ctl failures.
tern's failure at
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-17+10%3A57%3A49
does look similar, but we can see in its log that the standby
*did* notice the primary disconnection and then reconnect:
2021-07-17 16:29:08.248 UTC [17498380:2] LOG: replication terminated by primary server
2021-07-17 16:29:08.248 UTC [17498380:3] DETAIL: End of WAL reached on timeline 1 at 0/30378F8.
2021-07-17 16:29:08.248 UTC [17498380:4] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2021-07-17 16:29:08.248 UTC [25166230:5] LOG: invalid record length at 0/30378F8: wanted 24, got 0
2021-07-17 16:29:08.350 UTC [16318578:1] FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2021-07-17 16:29:36.369 UTC [7077918:1] FATAL: could not connect to the primary server: FATAL: the database system is starting up
2021-07-17 16:29:36.380 UTC [11338028:1] FATAL: could not connect to the primary server: FATAL: the database system is starting up
...
2021-07-17 16:29:36.881 UTC [17367092:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
So I'm not sure what happened there, but it's not an instance
of this problem. One thing that looks a bit suspicious is
this in the primary's log:
2021-07-17 16:26:47.832 UTC [12386550:1] LOG: using stale statistics instead of current ones because stats collector is not responding
which makes me wonder if the timeout is down to out-of-date
pg_stats data. The loop in 002_standby.pl doesn't appear to
depend on the stats collector:
my $primary_lsn =
$primary->safe_psql('postgres', 'select pg_current_wal_lsn()');
$standby->poll_query_until('postgres',
qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
or die "standby never caught up";
but maybe I'm missing the connection.
Apropos of that, it's worth noting that wait_for_catchup *is*
dependent on up-to-date stats, and here's a recent run where
it sure looks like the timeout cause is AWOL stats collector:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-01-10%2004%3A51%3A34
I wonder if we should refactor wait_for_catchup to probe the
standby directly instead of relying on the upstream's view.
regards, tom lane
10.01.2022 23:52, Thomas Munro wrote:
I'm yet to find out whether the other
WaitLatchOrSocket' users (e. g. postgres_fdw) can suffer from the
disconnected socket state, but this approach definitely works for
walreceiver.I see where you're going: there might be safe call sequences and
unsafe call sequences, and maybe walreceiver is asking for trouble by
double-polling. I'm not sure about that; I got the impression
recently that it's possible to get FD_CLOSE while you still have
buffered data to read, so then the next recv() will return > 0 and
then we don't have any state left anywhere to remember that we saw
FD_CLOSE, even if you're careful to poll and read in the ideal
sequence. I could be wrong, and it would be nice if there is an easy
fix along those lines... The documentation around FD_CLOSE is
unclear.
I had no strong opinion regarding unsafe sequence, though initially I
suspected that exactly the second libpqrcv_PQgetResult call could cause
the issue. But after digging into WaitLatchOrSocket I'd inclined to put
the fix deeper to satisfy all possible callers.
At the other hand, I've shared Tom's concerns regarding other clients,
that can stuck on WaitForMultipleObjects() just as walreceiver does, and
hoped that only walreceiver suffer from a graceful server socket closing.
So to get these doubts cleared, I've made a simple test for postgres_fdw
(please look at the attachment; you can put it into
contrib/postgres_fdw/t and run `vcregress taptest contrib\postgres_fdw`).
This test shows for me:
===
...
t/001_disconnection.pl .. # 12:13:39.481084 executing query...
# 12:13:43.245277 result: 0
# 0|0
# 12:13:43.246342 executing query...
# 12:13:46.525924 result: 0
# 0|0
# 12:13:46.527097 executing query...
# 12:13:47.745176 result: 3
#
# psql:<stdin>:1: WARNING: no connection to the server
# psql:<stdin>:1: ERROR: FATAL: terminating connection due to
administrator co
mmand
# server closed the connection unexpectedly
# This probably means the server terminated abnormally
# before or while processing the request.
# CONTEXT: remote SQL command: FETCH 100 FROM c1
# 12:13:47.794612 executing query...
# 12:13:51.073318 result: 0
# 0|0
# 12:13:51.074347 executing query...
===
With the simple logging added to connection.c:
/* Sleep until there's something to do */
elog(LOG, "pgfdw_get_result before WaitLatchOrSocket");
wc = WaitLatchOrSocket(MyLatch,
WL_LATCH_SET | WL_SOCKET_READABLE |
WL_EXIT_ON_PM_DEATH,
PQsocket(conn),
-1L, PG_WAIT_EXTENSION);
elog(LOG, "pgfdw_get_result after WaitLatchOrSocket");
I see in 001_disconnection_local.log:
...
2022-01-11 15:13:52.875 MSK|Administrator|postgres|61dd747f.5e4|LOG:
pgfdw_get_result after WaitLatchOrSocket
2022-01-11 15:13:52.875
MSK|Administrator|postgres|61dd747f.5e4|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-11 15:13:52.875 MSK|Administrator|postgres|61dd747f.5e4|LOG:
pgfdw_get_result before WaitLatchOrSocket
2022-01-11 15:13:52.875
MSK|Administrator|postgres|61dd747f.5e4|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-11 15:14:36.976 MSK|||61dd74ac.840|DEBUG: autovacuum:
processing database "postgres"
2022-01-11 15:14:51.088 MSK|Administrator|postgres|61dd747f.5e4|LOG:
pgfdw_get_result after WaitLatchOrSocket
2022-01-11 15:14:51.088
MSK|Administrator|postgres|61dd747f.5e4|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-11 15:14:51.089 MSK|Administrator|postgres|61dd747f.5e4|LOG:
pgfdw_get_result before WaitLatchOrSocket
2022-01-11 15:14:51.089
MSK|Administrator|postgres|61dd747f.5e4|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-11 15:15:37.006 MSK|||61dd74e9.9e8|DEBUG: autovacuum:
processing database "postgres"
2022-01-11 15:16:37.116 MSK|||61dd7525.ad0|DEBUG: autovacuum:
processing database "postgres"
2022-01-11 15:17:37.225 MSK|||61dd7561.6a0|DEBUG: autovacuum:
processing database "postgres"
2022-01-11 15:18:36.916 MSK|||61dd7470.704|LOG: checkpoint starting: time
...
2022-01-11 15:36:38.225 MSK|||61dd79d6.2a0|DEBUG: autovacuum:
processing database "postgres"
...
So here we get similar hanging on WaitLatchOrSocket().
Just to make sure that it's indeed the same issue, I've removed socket
shutdown&close and the test executed to the end (several times). Argh.
Best regards,
Alexander
Attachments:
On Wed, Jan 12, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
So here we get similar hanging on WaitLatchOrSocket().
Just to make sure that it's indeed the same issue, I've removed socket
shutdown&close and the test executed to the end (several times). Argh.
Ouch. I think our options at this point are:
1. Revert 6051857fc (and put it back when we have a working
long-lived WES as I showed). This is not very satisfying, now that we
understand the bug, because even without that change I guess you must
be able to reach the hanging condition by using Windows postgres_fdw
to talk to a non-Windows server (ie a normal TCP stack with graceful
shutdown/linger on process exit).
2. Put your poll() check into the READABLE side. There's some
precedent for that sort of kludge on the WRITEABLE side (and a
rejection of the fragile idea that clients of latch.c should only
perform "safe" sequences):
/*
* Windows does not guarantee to log an FD_WRITE network event
* indicating that more data can be sent unless the previous send()
* failed with WSAEWOULDBLOCK. While our caller might well have made
* such a call, we cannot assume that here. Therefore, if waiting for
* write-ready, force the issue by doing a dummy send(). If the dummy
* send() succeeds, assume that the socket is in fact write-ready, and
* return immediately. Also, if it fails with something other than
* WSAEWOULDBLOCK, return a write-ready indication to let our caller
* deal with the error condition.
*/
Thomas Munro <thomas.munro@gmail.com> writes:
Ouch. I think our options at this point are:
1. Revert 6051857fc (and put it back when we have a working
long-lived WES as I showed). This is not very satisfying, now that we
understand the bug, because even without that change I guess you must
be able to reach the hanging condition by using Windows postgres_fdw
to talk to a non-Windows server (ie a normal TCP stack with graceful
shutdown/linger on process exit).
It'd be worth checking, perhaps. One thing I've been wondering all
along is how much of this behavior is specific to the local-loopback
case where Windows can see both ends of the connection. You'd think
that they couldn't long get away with such blatant violations of the
TCP specs when talking to external servers, because the failures
would be visible to everyone with a web browser.
regards, tom lane
11.01.2022 23:16, Tom Lane wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Ouch. I think our options at this point are:
1. Revert 6051857fc (and put it back when we have a working
long-lived WES as I showed). This is not very satisfying, now that we
understand the bug, because even without that change I guess you must
be able to reach the hanging condition by using Windows postgres_fdw
to talk to a non-Windows server (ie a normal TCP stack with graceful
shutdown/linger on process exit).It'd be worth checking, perhaps. One thing I've been wondering all
along is how much of this behavior is specific to the local-loopback
case where Windows can see both ends of the connection. You'd think
that they couldn't long get away with such blatant violations of the
TCP specs when talking to external servers, because the failures
would be visible to everyone with a web browser.
I've split my test (both parts attached) and run it on two virtual
machines with clean builds from master (ac7c8075) on both (just the
debugging output added to connection.c). I provide probably redundant
info (also see attached screenshot) just to make sure that I didn't make
a mistake.
The excerpt from 001_disconnection1_local.log:
...
2022-01-12 09:29:48.099 MSK|Administrator|postgres|61de755a.a54|LOG:
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.099
MSK|Administrator|postgres|61de755a.a54|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG:
pgfdw_get_result: after WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG:
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|LOG:
pgfdw_get_result: after WaitLatchOrSocket
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
2022-01-12 09:29:48.100 MSK|Administrator|postgres|61de755a.a54|ERROR:
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|CONTEXT: remote SQL command:
FETCH 100 FROM c1
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|WARNING: no connection to the
server
2022-01-12 09:29:48.100
MSK|Administrator|postgres|61de755a.a54|CONTEXT: remote SQL command:
ABORT TRANSACTION
2022-01-12 09:29:48.107 MSK|Administrator|postgres|61de755a.a54|LOG:
disconnection: session time: 0:00:01.577 user=Administrator
database=postgres host=127.0.0.1 port=49752
2022-01-12 09:29:48.257 MSK|[unknown]|[unknown]|61de755c.a4c|LOG:
connection received: host=127.0.0.1 port=49754
2022-01-12 09:29:48.261 MSK|Administrator|postgres|61de755c.a4c|LOG:
connection authenticated: identity="WIN-FCPSOVMM1JC\Administrator"
method=sspi
(C:/src/postgrespro/contrib/postgres_fdw/tmp_check/t_001_disconnection1_local_data/pgdata/pg_hba.conf:2)
2022-01-12 09:29:48.261 MSK|Administrator|postgres|61de755c.a4c|LOG:
connection authorized: user=Administrator database=postgres
application_name=001_disconnection1.pl
2022-01-12 09:29:48.263 MSK|Administrator|postgres|61de755c.a4c|LOG:
statement: SELECT * FROM large WHERE a = fx2(a)
2022-01-12 09:29:48.285 MSK|Administrator|postgres|61de755c.a4c|LOG:
pgfdw_get_result: before WaitLatchOrSocket
2022-01-12 09:29:48.285
MSK|Administrator|postgres|61de755c.a4c|STATEMENT: SELECT * FROM large
WHERE a = fx2(a)
...
By the look of things, you are right and this is the localhost-only issue.
I've rechecked that the test 001_disconnection.pl (local-loopback
version) hangs on both machines while 001_disconnection1.pl performs
successfully in both directions. I'm not sure whether the Windows client
and non-Windows server or reverse combinations are of interest in light
of the above.
Best regards,
Alexander
Attachments:
Screenshot_20220112_093004.pngimage/png; name=Screenshot_20220112_093004.pngDownload
�PNG
IHDR j � ��- pHYs � ��+ IDATx^��{PTW�(���n����% A���a4�$q������3������������n<�:�W��d���3�FO�T&S���9fLrbF�(�(����<������
��s��X�w��������O�����k����o�^ !�B!�B!�B!�B!�B!�B!�B!�B!�B!�B!�Bh�P���]�x���D>����?��������/�I4M��|C���������3B!�B!�Z=���
�����~��������>�����I ���1�(�����w��9N>B!�B!�B� U��)~B!�B! V�E7>6<�O�~B!�w8�!�B!��$??U|b�2�h� �Y���s�rKu�r�Tf�,���Ka��R��`�G��/e�C��
|���w�uQ���^������NP�T7�T�;���<�s������hn� ���B!�Byb�O��
h�=hGb� ��H��$I��J�"����X'��T��s�����y3��=6C�Z� r>���h�9�������E���#h��;�|!�b���J4 P�.)��,�-���*@9+r;J��+���aY��\h��! ��MH�M�0|�B!�B��Ln���t�����[�@���7ytyy�c,��+���a���k�=�-)�b���D�h$�q�;�\n�����r6!�+����"�H �3�`��k�Iv��n�q���J^w�x��. ���jF��=x�M���|��c�B!�B�����
�e(
����/7��@�d����[�t:���,lG���b��l2��;
h
��y� #�LIN
�(IE��;�,�����M]�s�y�H��&������6$��7�|a�!�B!��$��MHRG�F���4161<08��0C�v8��i�)`c�4����b��4�F<E��-{*��H���9.W���
�Q��[�4M�4 37�s�J|�8� z���|�`1V{,I[��������/���w
y��c�B!�zR(C�K��bb�C���)kS'����7�f��.4��D����a_�����\��[16I����^����IS���ff�~
2�d���!s�)����t�{�5��-�����[lW���/�Y���������5�K�F����y���7�|����i!�B!�;Q11eOoW���3�����)����&���0k4P2
4M�@���5��6
4)D6h&�A�����b#dL�FWd��
���a%�,V���9O���6��69avih�v% FWrdhh�x>�?KA�Ly�������q=F�"�D�Kf��r��\�SG�8��q���U���8� B!��B������,?!�BKJ�,{z���?��\.�OL�X,�����L��%P����D�ip�ESx\�W�n����8�$#�����
�I��j1S2��h8�A��S��]������5�w�BQ�������I��d�?y�3��Or��"���r�������1 `!������i�yn��!&F��{��]���ZtlL~���M� I���#�C�����UNOMgE�t=4��?}��!��'���-d���n�8;�+29>a1[��ryxdxo?��q�@�d����6�2�l�[H��#I4�1c�a0����V��yl�������',�'���#GQ�����)k3��2�&����_�T*v���/8�e��Q2
4� Q����_N&C0�6����r[��dqw�c>���e�h�q�;���Yx��}Lb��6 ����B!Dl���N7y��� ��{� p��j��63=19���r����GK�����.8$d�d�h�?�� !��
�E�Q�����u���0DR������;��g�`��@Q6���_�wY�������������V��$�0Wl�������n��z����^NE���� r���w����l��5q����#"(
��]C��BK���=]�}z�����h��S�XV�-&Nu���O�������%E]��M-l��J�����H��n��� �Np84M�f���=����B-ZL\, �k��bU��z�v��������I%��f�����m[�nTLL�*vL�e M���������)�6���
#�n/�iv����<�T�X���Qv�\�(6����hp���e�f���sT�*U�jzz�����h���8�v�����h6
I� @�[�3��o�12���u�y�s����yh��������l�{[J�@��I�����!�Qs���%y%3Xn���h�~_��4-�>�%�=P�{�!��Z�&A�x� ���?<" �,f3 ��T�WY����_\��;7oM��"����o����OMo��ybl�FMmxD��O�&t3C\��
���G�H�i�/|7c0�����]
�b���-;����7v��[������N�#�q�X� ������&1�hsq����C7GLl���M��y��@_�:%����������������L�����Ml�����gy�DTz�����E���y'P��-��������BO����\.��eTL��n��M�3<���-+� ��5>F����vTLt�����o�d;.��9*&�sH�H�h�yMO L���gb�����y�/g�fj����>T�#��4/J�����{��I��G1 �-�S�)�������6����~%���4P�q���Q���i p��P��8�2����L&����_7I�
��e$��b�����n(�{>l�(f������t�'���s��DH����� �&y��8��V�,&��=�x�DFG�nB
a����6�z�B��F��7�+
����_����4�&!������hUL�� ��\ec]��vl�s�M��d��Q�*ggf���M��*C��
#�"M����n��������hU����
��i�Y�[��iG5��;�x�����_��������^�����\uJ�B�0�M����:��z��M�=9>a��
���a���kv�nbrdp(ymJ��?�W������<��l����y��!��!.aM���e��f������+ 4<�a�����`e������F���_��,FC�dE9�v��r��o��@`P`���U��j�l����=��ci�ym�9��QMS��M�%}�=v�}�s�uW�m������GE�u)(�� &N }zZ��f{�f
�
����� ��-+{Y��S3���mq8 4����*����)����$�o,.�S�840������o���� 2:�t{��� D_���1��`�Q�*'��E/����
�E��z�b�(CC���k��x��Bhu�������_�)������l�\w�/�3rTmu
�(��f��XW\����1 pC���Xf����$
@1��9����8�Y�O�����!p�+!��C?9|(�� ��%�$�r�K�����v`PPv^�LF9$�2�����xg����,S�hp @�2xzj�v���� g�01q*�njnv(`r�����&)������������ R�\�K}H�������������b�g>����w1�3���+}��nBeX�L&�z�p8f���/EDGl� wo����)0v�B�3��sss��1���5 ����a_����AWl����y���i.'}A�����@������W��l~���( ��NOM���oerY���
�
o������b��w�d�U��l�����M��b�5��������mV[��3��B��)�ThX��hd���O��k���~ �W'r�b���3��2MF���, g�w�S�Q���W><0HVZLIO���a�X7m��_����M�8���� �.7[�XWOZV;g�+)�2��}���C�!O}��`0����� eh���7Q3<*��#����c P������0��i�O�K��m������7@Q�~n�*�v�9��<77�����;��@��L�����f�k� emZOop�J���������1�?�/�Nt��� d2YtlLPP��f$.���H��HAQ��77o�hm���!�V��� ���y� ���dQ�Q�*��E1o�HU7��Eo.�����!����;�5�� �4&�F`v�/�]��l�_�s^�g��0���+��s�/��3�2qKv�H�� '�'�C��PTZ�Z���6;3k�;��CyE���\��(H�X��t�w��f(�
�kh���������
��-3'k��?":"4444,419���wjr���/15%4,�a�i�5cZ��0
�����A�Fr;
� �=�s����E���VD����n��_h�R<��D^]ocP��`}k���I��������-w[��>����Y�s�� ���om��p8�2��dc�!�������4Z�U{K��n�)����1q�q����ro��� ��i�� �
��]����3�tv���eHdt��+��v��nok���3wcm����b�b
z��N�O���}&�I��HLJ������Z'(�*���AG���n��������"����8r���r�V���D��27�] A71i�3���d2���P[GB�]�;��x
�O_tqda�
�_DT�������7�'������ ��r4M[�� :��p��K~~~���.]!�x�8'�
��������9m����kT-w\KUP�=�MIK��d�du�����s�,�����#������y9y�2��n�55�5��z��-w�����-���v;�b�B�vi6����Hx���y�����v 8(R��)C(h��sf��vv�]F�r=�(p���I�v��;�<�!�
�U��p �NZ|X�`%�^�5�]�o��^�����0 ��|]+,"L��i ����-��r�H%�^��h����M�(�BC�C�:��[B���!0(�v8��3 @��:I=�7`���h��1��4��N������M�5{;�2����dj
������[ <�*?���v�t-nw�C��
����D�&��������F;7)�/����_�/�W�i����{��������@DTdN^���{�Z#�=�TEjV���2�c�!��1�63'+*6�l6�p��a&v�*F;28�/
@b� ��;d2 Z-V6*�~
/(8�j����F���(?����f����OL��ht���7�w�wDEG��k0�L�����^�.7{�����6���E��v���� �-��yn* (~��Y�p��������)k���d%W( �f�Q2��y�,��
����vOXm`P EQ�<��)AQ#C"�CTF����l &�# Y�: 0�j��%)�����-d�;�[����@�qy������M�Q��Q�S�:�kr �����
yA���1��I�I�AN�s%��tu�{��3�a�>V����mQ�)f���k��z:��
7����W�B�n�����]��w[s��&2��L�P������e[@b�
�B@8h�n���L�O�(*:&Z�����F;�g_� ���9��T��<���4��~�Q��M�w��r���vy��]��!vC2( ��+����`�@�t������QhD��� hoi5L�A���"o�@PHp������Y�l��~gFVfPH0[,2*r�}n��?Ph�O; �n�j5c&� ( ��fw��A�mv��NV�
����A�i��41>�v���]��������|R�_n��>�����j�����0�N���g������Lg�����"/h]����{_W�M85���t/'?������##��r�;EjV��v 6?�ebk�B��i4E�� � ����#���������[�����3������r��0;������ 8$��i2�uL������_S{��8kV�$�fgfI���7��y�������;7�����b%k�QO�>�5����VB��<��$<|^b�T�y���\����4�g�� @��'����t��#Z�yn��p�������/��;��;x�s���!^���v�M\�O����������'W(�d��u����C I��QQ�E'lV���PrZJtl�0d��\��K�p8(��l9?�,z��h�6L�5�#q�k�5cR7 0 3'���{���q���!�?$\;�G���v�T�x\3Vw����c��1t&!c��E��>�;l��y�9��b���@�V�
�����~rEl�*D���u���I X�.sL����
�3�#���8UHX��f�����ed������T��~&�qdhD�R���m��Q��^ �Y��������Y��������� ��2:<b4 444F���w8���bc�Z�XY/C1]{BL*'b������zgf�de�219�A���y�����u�M85=��9��3�Bf�<�W��%&'������@_wOJz����\!�ulFo����+��:���h2���lV ��2����n��3Y�����3M�kS��B��V3c@8bB��V4o�������d����yG�����f�8o)�T�$������?���j�6�edev�� 95�nC�~>�EW���5��*�o-7���������l�BB�E?5������<�fthd��m��9���W�333zCV^N{s���_ZF:I�1�&u����m��������~^��j3��Y��� hG5Y��
��(��i��f3��~���2�l����v���
�������u������#��o\���%z������lv�������4?? ��l�}��7�����(��������#Z�Ao��M��X���b�������i�>jmj)*-�3��
�G�*��hc[s����x���3zCn�������C����z;��_�yS@@������D����Om�;���Q~��Jx,KDT�P��L&s83�����i�T@`��u����J
]��0�����n,�4���ns�S[�o�oj���!�#��1���������!��2���`��T��.����e1J�m��a�7<0D6Xa��~~� `2Y)J��(
&'�C�#B�
�f��E
E@L\<