pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Started by Andres Freundabout 4 years ago15 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

I was wondering in [1]/messages/by-id/20211231192528.wirwj4qaaw3ted5g@alap3.anarazel.de what we could do about the slowest tests on
windows.

On 2021-12-31 11:25:28 -0800, Andres Freund wrote:

Picking a random successful cfbot run [1] I see the following tap tests taking
more than 20 seconds:

67188 ms pg_basebackup t/010_pg_basebackup.pl
59710 ms recovery t/001_stream_rep.pl

Comparing these times to measurements taken on my normal linux workstation,
something seemed just *very* off, even with a slow CI instance and windows in
the mix.

A bunch of printf debugging later, I realized the problem is that several of
the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl
the backups from the standby each take just over 10s. That's awfully
specific...

# Taking pg_basebackup my_backup from node "standby_1"
# Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1 -p 59181 --checkpoint fast --no-sync --label my_backup -v
# ran in 10.145s
# Backup finished

This reproduceably happens and it's *not* related to the socket shutdown()
changes we've been debugging lately - even after a revert the problem
persists.

Because our logging for basebackups is quite weak, both for server and client
side, I needed to add a fair bit more debugging to figure it out:

pg_basebackup: wait to finish at 0.492
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: stream poll timeout 10.112

The problem is that there's just no implemented way to timely shutdown the WAL
streaming thread in pg_basebackup. The code in pg_basebackup.c says:

if (verbose)
pg_log_info("waiting for background process to finish streaming ...");
...
/*
* On Windows, since we are in the same process, we can just store the
* value directly in the variable, and then set the flag that says
* it's there.
*/
...
xlogendptr = ((uint64) hi) << 32 | lo;
InterlockedIncrement(&has_xlogendptr);

But just setting a variable doesn't do much if the thread is in
HandleCopyStream()->CopyStreamPoll()->select()

The only reason we ever succeed shutting down, without more WAL coming in, is
that pg_basebackup defaults to sending a status message every 10 seconds. At
which point the thread sees has_xlogendptr = true, and shuts down.

A test specific workaround would be to just add --status-interval=1 to
Cluster.pm::backup(). But that seems very unsatisfying.

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().

Any more straightforward ideas?

From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Greetings,

Andres Freund

[1]: /messages/by-id/20211231192528.wirwj4qaaw3ted5g@alap3.anarazel.de

#2Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#1)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Sun, Jan 16, 2022 at 10:22 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

I was wondering in [1] what we could do about the slowest tests on
windows.

On 2021-12-31 11:25:28 -0800, Andres Freund wrote:

Picking a random successful cfbot run [1] I see the following tap tests taking
more than 20 seconds:

67188 ms pg_basebackup t/010_pg_basebackup.pl
59710 ms recovery t/001_stream_rep.pl

Comparing these times to measurements taken on my normal linux workstation,
something seemed just *very* off, even with a slow CI instance and windows in
the mix.

A bunch of printf debugging later, I realized the problem is that several of
the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl
the backups from the standby each take just over 10s. That's awfully
specific...

# Taking pg_basebackup my_backup from node "standby_1"
# Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1 -p 59181 --checkpoint fast --no-sync --label my_backup -v
# ran in 10.145s
# Backup finished

This reproduceably happens and it's *not* related to the socket shutdown()
changes we've been debugging lately - even after a revert the problem
persists.

Because our logging for basebackups is quite weak, both for server and client
side, I needed to add a fair bit more debugging to figure it out:

pg_basebackup: wait to finish at 0.492
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: stream poll timeout 10.112

The problem is that there's just no implemented way to timely shutdown the WAL
streaming thread in pg_basebackup. The code in pg_basebackup.c says:

if (verbose)
pg_log_info("waiting for background process to finish streaming ...");
...
/*
* On Windows, since we are in the same process, we can just store the
* value directly in the variable, and then set the flag that says
* it's there.
*/
...
xlogendptr = ((uint64) hi) << 32 | lo;
InterlockedIncrement(&has_xlogendptr);

But just setting a variable doesn't do much if the thread is in
HandleCopyStream()->CopyStreamPoll()->select()

The only reason we ever succeed shutting down, without more WAL coming in, is
that pg_basebackup defaults to sending a status message every 10 seconds. At
which point the thread sees has_xlogendptr = true, and shuts down.

A test specific workaround would be to just add --status-interval=1 to
Cluster.pm::backup(). But that seems very unsatisfying.

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().

Any more straightforward ideas?

From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Ugh, yeah that sounds like a correct analysis to me, and ugh, yeah
that's not very nice.

And yes, I think we have to create an event, and then use
WSAEventSelect() + WaitForSingleObjectEx(). Should be enough to just
use one event I think, and then the timeout -- but it might be more
readable to have a separate event for the socket and the stop? But we
can have just one event that's both used to stop and then use
WSAEventSelect() to associate it with the socket as well as neede.

(And yes, I agree that it's a lot better to fix it properly than to
just reduce the timeout)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Andres Freund <andres@anarazel.de> writes:

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().
Any more straightforward ideas?
From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Hmm --- I see that I thought Windows was unaffected, but I didn't
consider this angle.

Can we send the child process a signal to kick it off its wait?

regards, tom lane

#4Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#3)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().
Any more straightforward ideas?
From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Hmm --- I see that I thought Windows was unaffected, but I didn't
consider this angle.

Can we send the child process a signal to kick it off its wait?

No. (1) on Windows it's not a child process, it's a thread. And (2)
Windows doesn't have signals. We emulate those *in the backend* for
win32, but this problem is in the frontend where that emulation layer
doesn't exist.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#5Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#4)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().
Any more straightforward ideas?
From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Hmm --- I see that I thought Windows was unaffected, but I didn't
consider this angle.

Can we send the child process a signal to kick it off its wait?

No. (1) on Windows it's not a child process, it's a thread. And (2)
Windows doesn't have signals. We emulate those *in the backend* for
win32, but this problem is in the frontend where that emulation layer
doesn't exist.

Actually, just after sending that...

What we could do is do a WSACancelBlockingCall() which will cancel the
select() thereby making us do the check. However, per the docs
(https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsacancelblockingcall)
this function is no longer exported in Winsock 2, so this does not
seem to be the right way forward. There is no replacement function for
it -- the suggestion is basically "don't do that, use multithreading
instaed" which I think brings us back to the original suggestion of
WSAEventSelect().

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#6Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#5)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-16 17:39:11 +0100, Magnus Hagander wrote:

On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select()
doesn't support pipes and there's no socketpair().
Any more straightforward ideas?
From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Hmm --- I see that I thought Windows was unaffected, but I didn't
consider this angle.

Can we send the child process a signal to kick it off its wait?

No. (1) on Windows it's not a child process, it's a thread. And (2)
Windows doesn't have signals. We emulate those *in the backend* for
win32, but this problem is in the frontend where that emulation layer
doesn't exist.

[...] which I think brings us back to the original suggestion of
WSAEventSelect().

I hacked that up last night. And a fix or two later, it seems to be
working. What I'd missed at first is that the event needs to be reset in
reached_end_position(), otherwise we'll busy loop.

I wonder if using a short-lived event handle would have dangers of missing
FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
the event just once.

I just wasn't immediately sure where to stash it. Probably just by adding a
field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
need to stay that way?

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-16 15:28:00 -0800, Andres Freund wrote:

I hacked that up last night. And a fix or two later, it seems to be
working. What I'd missed at first is that the event needs to be reset in
reached_end_position(), otherwise we'll busy loop.

I wonder if using a short-lived event handle would have dangers of missing
FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
the event just once.

I just wasn't immediately sure where to stash it. Probably just by adding a
field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
need to stay that way?

Oops, attached the patch this time.

Greetings,

Andres Freund

Attachments:

v2-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patchtext/x-diff; charset=us-asciiDownload+112-13
#8Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#7)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-01-16 15:28:00 -0800, Andres Freund wrote:

I hacked that up last night. And a fix or two later, it seems to be
working. What I'd missed at first is that the event needs to be reset in
reached_end_position(), otherwise we'll busy loop.

You can create the event with bManualReset set to False to avoid that,
no? With this usecase, I don't really see a reason not to do that
instead?

I wonder if using a short-lived event handle would have dangers of missing
FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
the event just once.

I just wasn't immediately sure where to stash it. Probably just by adding a
field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
need to stay that way?

Oops, attached the patch this time.

Do we really want to create a new event every time? Those are kernel
objects, so they're not entirely free, but that part maybe doesn't
matter. Wouldn't it be cleaner to do it like we do in
pgwin32_waitforsinglesocket() which is create it once and store it in
a static variable? Or is that what you're suggesting above in the "I
wonder if" part?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#9Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#8)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-17 14:50:27 +0100, Magnus Hagander wrote:

On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-01-16 15:28:00 -0800, Andres Freund wrote:

I hacked that up last night. And a fix or two later, it seems to be
working. What I'd missed at first is that the event needs to be reset in
reached_end_position(), otherwise we'll busy loop.

You can create the event with bManualReset set to False to avoid that,
no? With this usecase, I don't really see a reason not to do that
instead?

The problem I'm referring to is that some types of events are edge
triggered. Which we've been painfully reminded of recently:
/messages/by-id/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com

It appears there's no guarantee that you'll see e.g. FD_CLOSE if you use
short-lived events (the FD_CLOSE is recorded internally but not signalled
immediately if there's still readable data, and the internal record is reset
by WSAEventSelect()).

I wonder if using a short-lived event handle would have dangers of missing
FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
the event just once.

I just wasn't immediately sure where to stash it. Probably just by adding a
field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
need to stay that way?

Oops, attached the patch this time.

Do we really want to create a new event every time? Those are kernel
objects, so they're not entirely free, but that part maybe doesn't
matter. Wouldn't it be cleaner to do it like we do in
pgwin32_waitforsinglesocket() which is create it once and store it in
a static variable? Or is that what you're suggesting above in the "I
wonder if" part?

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#9)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-29 12:44:22 -0800, Andres Freund wrote:

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

Here's a version of the patch only creating the event once. Needs a small bit
of comment polishing, but otherwise I think it's sane?

Greetings,

Andres Freund

Attachments:

v3-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patchtext/x-diff; charset=us-asciiDownload+118-13
#12Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#10)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Sat, Jan 29, 2022 at 9:44 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

I would say no. It's an internal API and it's not like pg_basebackup
can load plugins.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#13Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#11)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-01-29 12:44:22 -0800, Andres Freund wrote:

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

Here's a version of the patch only creating the event once. Needs a small bit
of comment polishing, but otherwise I think it's sane?

LGTM in general, yes.

I'm wondering about the part that does:
+       events[0] = stream->net_event;
+       nevents++;
+
+       if (stream->stop_event != NULL)
+       {
+               events[1] = stream->stop_event;
+               nevents++;
+       }
+

Using a combination of nevents but hardcoded indexes does work -- but
only as long as there is only one optional entry. Should they perhaps
be written
+ events[nevents++] = stream->net_event;

instead, for future proofing? But then you'd also have to change the
if() statement on the return side I guess.

Can of course also be changed at such a point where a third event
might be added. Not important, but it poked me in the eye when I was
reading it.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#14Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#13)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-30 16:51:12 +0100, Magnus Hagander wrote:

On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-01-29 12:44:22 -0800, Andres Freund wrote:

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

Here's a version of the patch only creating the event once. Needs a small bit
of comment polishing, but otherwise I think it's sane?

LGTM in general, yes.

Thanks for checking.

I'm wondering about the part that does:
+       events[0] = stream->net_event;
+       nevents++;
+
+       if (stream->stop_event != NULL)
+       {
+               events[1] = stream->stop_event;
+               nevents++;
+       }
+

Using a combination of nevents but hardcoded indexes does work -- but
only as long as there is only one optional entry. Should they perhaps
be written
+ events[nevents++] = stream->net_event;

instead, for future proofing? But then you'd also have to change the
if() statement on the return side I guess.

I did wonder about it, but the index checks get sufficiently more complicated
that it didn't quite seem worth it. It didn't seem that likely these would get
a third event to check...

I think we're going to have to generalize something like our wait events to be
frontend usable at some point. The proportion and complexity of frontend code
is increasing...

Greetings,

Andres Freund

#15Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#11)
Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

Hi,

On 2022-01-29 13:47:13 -0800, Andres Freund wrote:

Here's a version of the patch only creating the event once. Needs a small bit
of comment polishing, but otherwise I think it's sane?

Ah, it needs a bit more. I was not cleaning up the event at the exit of
ReceiveXlogStream(). For pg_basebackup that perhaps wouldn't matter, but
pg_receivewal loops...

We don't have a good spot for cleaning up right now. ReceiveXlogStream() has
plenty returns. The attached changes those to a goto done; but pretty it is
not. But probably still the best way for the backbranches?

I think the receivelog.c interface probably could do with a bit of
cleanup... The control flow is quite complicated, with repeated checks all
over etc :(. And the whole thing with giving the appearance of being
instantiatable multiple times, but then using global variables for state, is
...

Attached a revised version.

Greetings,

Andres Freund

Attachments:

v3-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patchtext/x-diff; charset=us-asciiDownload+159-22