Race conditions with WAL sender PID lookups

Started by Michael Paquieralmost 9 years ago37 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

Any thoughts about the patch attached to make things more consistent?
It seems to me that having some safeguards would be nice for
robustness.

That's an old bug, so I am adding that to the next CF.
Thanks,
--
Michael

Attachments:

walsnd-pid-races.patchapplication/octet-stream; name=walsnd-pid-races.patchDownload+11-5
#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#1)
Re: Race conditions with WAL sender PID lookups

On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Hi all,

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

Any thoughts about the patch attached to make things more consistent?
It seems to me that having some safeguards would be nice for
robustness.

+1. I think this is a sensible change.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#2)
Re: Race conditions with WAL sender PID lookups

At Fri, 12 May 2017 11:44:19 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA15xsu1gbOH=L1XU7g7zuKk1UACtOz+-mqOwP1_xBC_g@mail.gmail.com>

On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Hi all,

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

Any thoughts about the patch attached to make things more consistent?
It seems to me that having some safeguards would be nice for
robustness.

+1. I think this is a sensible change.

It intends to avoid exccesive locking during looking up stats
values. But we don't have so much vacant WanSnd slots in a
reasonable setup. Thus it seems reasonable to read the pid value
within the lock section since it adds practically no additional
cost. pg_stat_get_wal_receiver seems to need the same amendment
since the code is a parallel to that of wal receiver.

Or, if we were too sensitive to such locks for nothing, we could
use double-checked locking but I don't think we are so here.

In short, +1 too and walreceiver needs the same amendment.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#1)
Re: Race conditions with WAL sender PID lookups

On Wed, May 10, 2017 at 9:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

Well, it's probably worth changing for consistency, but I'm not sure
that it rises to the level of "a very bad idea". It actually seems
almost entirely harmless. Spuriously setting the needreload flag on a
just-deceased WAL sender will just result in some future WAL sender
doing a bit of unnecessary work, but I don't think it breaks anything
and the probability is vanishingly low. The other change could result
a bogus 0 PID in pg_stat_get_wal_senders output, but I bet you
couldn't reproduce that more than once in a blue moon even with a test
rig designed to provoke it, and if it does happen it isn't really
anything more than a trivial annoyance.

So I'm in favor of committing this and maybe even back-patching it,
but I also don't think it's a big deal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#4)
Re: Race conditions with WAL sender PID lookups

On Wed, May 17, 2017 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Well, it's probably worth changing for consistency, but I'm not sure
that it rises to the level of "a very bad idea". It actually seems
almost entirely harmless. Spuriously setting the needreload flag on a
just-deceased WAL sender will just result in some future WAL sender
doing a bit of unnecessary work, but I don't think it breaks anything
and the probability is vanishingly low. The other change could result
a bogus 0 PID in pg_stat_get_wal_senders output, but I bet you
couldn't reproduce that more than once in a blue moon even with a test
rig designed to provoke it, and if it does happen it isn't really
anything more than a trivial annoyance.

Well, the window is very low, so only tests with precisely taken
breakpoints would show problems.

So I'm in favor of committing this and maybe even back-patching it,
but I also don't think it's a big deal.

Thanks. I would not mind if this is seen as a HEAD-only improvement.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#1)
Re: Race conditions with WAL sender PID lookups

On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

There is also code that accesses shared walsender state without
spinlocks over in syncrep.c. I think that file could use a few words
of explanation for why it's OK to access pid, state and flush without
synchronisation.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#6)
Re: Race conditions with WAL sender PID lookups

On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

There is also code that accesses shared walsender state without
spinlocks over in syncrep.c. I think that file could use a few words
of explanation for why it's OK to access pid, state and flush without
synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.
--
Michael

Attachments:

walsnd-pid-races-v2.patchapplication/octet-stream; name=walsnd-pid-races-v2.patchDownload+47-13
#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#7)
Re: Race conditions with WAL sender PID lookups

On Fri, May 19, 2017 at 11:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

There is also code that accesses shared walsender state without
spinlocks over in syncrep.c. I think that file could use a few words
of explanation for why it's OK to access pid, state and flush without
synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#8)
Re: Race conditions with WAL sender PID lookups

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?
--
Michael

Attachments:

walsnd-pid-races-v3.patchtext/x-patch; charset=US-ASCII; name=walsnd-pid-races-v3.patchDownload+69-31
#10Erik Rijkers
er@xs4all.nl
In reply to: Michael Paquier (#9)
Re: Race conditions with WAL sender PID lookups

On 2017-05-20 14:40, Michael Paquier wrote:

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?

[walsnd-pid-races-v3.patch]

With this patch on current master my logical replication tests
(pgbench-over-logical-replication) run without errors for the first time
in many days (even weeks).

I'll do still more and longer tests but I have gathered already a long
streak of successful runs since you posted the patch so I am getting
convinced this patch is solved the problem that I was experiencing.

Pity it didn't make the beta.

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#10)
Re: Race conditions with WAL sender PID lookups

On 2017-05-21 06:37, Erik Rijkers wrote:

On 2017-05-20 14:40, Michael Paquier wrote:

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

Also, as Horiguchi-san pointed out earlier, walreceiver seems need
the
similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?

[walsnd-pid-races-v3.patch]

With this patch on current master my logical replication tests
(pgbench-over-logical-replication) run without errors for the first
time in many days (even weeks).

Unfortunately, just now another logical-replication failure occurred.
The same as I have seen all along:

The symptom: after starting logical replication, there are no rows in
pg_stat_replication and in the replica-log logical replication complains
about max_replication_slots being too low. (from previous experience I
know that making max_replication_slots higher does indeed 'help', but
only until the next (same) error occurs, with renewed (same) complaint).

Also from previous experience of this failed state I know that it can be
'cleaned up' by
manually emptying these tables:
delete from pg_subscription_rel;
delete from pg_subscription;
delete from pg_replication_origin;
Then it becomes possible to start a new subscription without the above
symptoms.

I'll do some more testing and hopefully get some information that's less
vague...

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#9)
Re: Race conditions with WAL sender PID lookups

On Sat, May 20, 2017 at 09:40:57PM +0900, Michael Paquier wrote:

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#12)
Re: Race conditions with WAL sender PID lookups

On 5/29/17 21:38, Noah Misch wrote:

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item.

I don't think this is my item. Most of the behavior is old, and
pg_stat_get_wal_receiver() is from commit
b1a9bad9e744857291c7d5516080527da8219854.

(The logical replication equivalent of that is
pg_stat_get_subscription(), which does appear to use appropriate locking.)

I would appreciate if another committer can take the lead on this.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#13)
Re: Race conditions with WAL sender PID lookups

On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I don't think this is my item. Most of the behavior is old, and
pg_stat_get_wal_receiver() is from commit
b1a9bad9e744857291c7d5516080527da8219854.

I would appreciate if another committer can take the lead on this.

Those things are on Alvaro's plate for the WAL receiver portion, and I
was the author of those patches. The WAL sender portion is older
though, but it seems crazy to me to not fix both things at the same
time per their similarities.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#7)
Re: Race conditions with WAL sender PID lookups

On Fri, May 19, 2017 at 2:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I had my eyes on the WAL sender code this morning, and I have noticed
that walsender.c is not completely consistent with the PID lookups it
does in walsender.c. In two code paths, the PID value is checked
without holding the WAL sender spin lock (WalSndRqstFileReload and
pg_stat_get_wal_senders), which looks like a very bad idea contrary to
what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
doing for ages.

There is also code that accesses shared walsender state without
spinlocks over in syncrep.c. I think that file could use a few words
of explanation for why it's OK to access pid, state and flush without
synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.

I don't claim that that stuff is wrong, just that it would be good to
hear an analysis. I think the question is: is there a way for syncrep
to hang because of a perfectly timed walsender transition, however
vanishingly unlikely? I'm thinking of something like: syncrep skips a
walsender slot because it's looking at arbitrarily old 'pid' from
before a walsender connected, or gets a torn read of 'flush' that
comes out as 0 but was actually non-0.

Incidentally, I suspect that a couple of places where 'volatile' is
used it's superfluous (accessing things protected by an LWLock that is
held).

I don't see any of this as 'open item' material, it's interesting to
look into but it's preexisting code. As for unlocked reads used for
pg_stat_X views, it seems well established that we're OK with that (at
least for things that the project has decided can be read atomically,
to wit aligned 32 bit values).

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#14)
Re: Race conditions with WAL sender PID lookups

On Sun, May 21, 2017 at 08:19:34AM +0200, Erik Rijkers wrote:

On 2017-05-21 06:37, Erik Rijkers wrote:

With this patch on current master my logical replication tests
(pgbench-over-logical-replication) run without errors for the first
time in many days (even weeks).

Unfortunately, just now another logical-replication failure occurred. The
same as I have seen all along:

This creates a rebuttable presumption of logical replication being the cause
of this open item. (I am not stating an opinion on whether this rebuttable
presumption is true or is false.) As long as that stands and Alvaro has not
explicitly requested ownership of this open item, Peter owns it.

On Tue, May 30, 2017 at 11:13:37AM -0700, Michael Paquier wrote:

On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I don't think this is my item. Most of the behavior is old, and
pg_stat_get_wal_receiver() is from commit
b1a9bad9e744857291c7d5516080527da8219854.

I would appreciate if another committer can take the lead on this.

Those things are on Alvaro's plate for the WAL receiver portion, and I
was the author of those patches. The WAL sender portion is older
though, but it seems crazy to me to not fix both things at the same
time per their similarities.

As a 9.6 commit, b1a9bad cannot be the cause of a v10 open item. If a v10
commit expanded the consequences of a pre-existing bug, the committer of that
v10 work owns this open item. If the bug's consequences are the same in v9.6
and v10, this is ineligible to be an open item. Which applies?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#16)
Re: Race conditions with WAL sender PID lookups

On Fri, Jun 2, 2017 at 2:21 PM, Noah Misch <noah@leadboat.com> wrote:

On Tue, May 30, 2017 at 11:13:37AM -0700, Michael Paquier wrote:

On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I don't think this is my item. Most of the behavior is old, and
pg_stat_get_wal_receiver() is from commit
b1a9bad9e744857291c7d5516080527da8219854.

I would appreciate if another committer can take the lead on this.

Those things are on Alvaro's plate for the WAL receiver portion, and I
was the author of those patches. The WAL sender portion is older
though, but it seems crazy to me to not fix both things at the same
time per their similarities.

As a 9.6 commit, b1a9bad cannot be the cause of a v10 open item. If a v10
commit expanded the consequences of a pre-existing bug, the committer of that
v10 work owns this open item. If the bug's consequences are the same in v9.6
and v10, this is ineligible to be an open item. Which applies?

You are right. Even 1bdae16f is from 9.6. Mea culpa. I have moved that
into the section of older bugs.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#9)
Re: Race conditions with WAL sender PID lookups

On Sat, May 20, 2017 at 8:40 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Actually, now that I look at it, ready_to_display should as well be
protected by the lock of the WAL receiver, so it is incorrectly placed
in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
is lazy as well, and that's new in 10, so we have an open item here
for both of them. And I am the author for both things. No issues
spotted in walreceiverfuncs.c after review.

I am adding an open item so as both issues are fixed in PG10. With the
WAL sender part, I think that this should be a group shot.

So what do you think about the attached?

I think if you're going to fix it so that we take spinlocks on
MyWalSnd in a bunch of places that we didn't take them before, it
would make sense to fix all the places where we're accessing those
fields without a spinlock instead of only some of them, unless there
are good reasons why we only need it in some cases and not others, in
which case I think the patch should add comments to each place where
the lock was not taken explaining why it's OK. It didn't take me and
my copy of vi very long to find instances that you didn't change.

I also think that should provide some analysis regarding the question
Thomas asked - are these actually bugs, or are they OK for some
reason? We shouldn't just plaster the code with spinlock acquisitions
unless it's really necessary. It seems at least possible that these
changes could cause performance regressions, and it doesn't look like
you've done any testing or provided any analysis indicating whether
that's likely to happen or not.

Basically, I don't think this patch is ready to commit. We have
neither evidence that it is necessary nor evidence that it is
complete. I don't want to be unfair, here, but it seems to me you
ought to do a little more legwork before throwing this over the wall
to the pool of committers.

[ Side note: Erik's report on this thread initially seemed to suggest
that we needed this patch to make logical decoding stable. But my
impression is that this is belied by subsequent developments on other
threads, so my theory is that this patch was never really related to
the problem, but rather than by the time Erik got around to testing
this patch, other fixes had made the problems relatively rare, and the
apparently-improved results with this patch were just chance. If that
theory is wrong, it would be good to hear about it. ]

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Erik Rijkers
er@xs4all.nl
In reply to: Robert Haas (#18)
Re: Race conditions with WAL sender PID lookups

On 2017-06-07 20:31, Robert Haas wrote:

[...]

[ Side note: Erik's report on this thread initially seemed to suggest
that we needed this patch to make logical decoding stable. But my
impression is that this is belied by subsequent developments on other
threads, so my theory is that this patch was never really related to
the problem, but rather than by the time Erik got around to testing
this patch, other fixes had made the problems relatively rare, and the
apparently-improved results with this patch were just chance. If that
theory is wrong, it would be good to hear about it. ]

Yes, agreed; I was probably mistaken.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#18)
Re: Race conditions with WAL sender PID lookups

v

On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think if you're going to fix it so that we take spinlocks on
MyWalSnd in a bunch of places that we didn't take them before, it
would make sense to fix all the places where we're accessing those
fields without a spinlock instead of only some of them, unless there
are good reasons why we only need it in some cases and not others, in
which case I think the patch should add comments to each place where
the lock was not taken explaining why it's OK. It didn't take me and
my copy of vi very long to find instances that you didn't change.

I take that you are referring to the two lookups in
WalSndWaitForWal(), one in exec_replication_command(), one in
WalSndLoop() and one in WalSndDone(). First let's remember that all
the fields protected by the spinlock are only updated in a WAL sender
process, so reading them directly is safe in this context: we know
that no other processes will write them. And all those functions are
called only in the context of a WAL sender process. So the copies of
MyWalSnd that you are referring to here don't need a spinlock. Is
there something I am missing?

Actually, perhaps it would make sense to add some Assert(am_walsender)
in this code?

I also think that should provide some analysis regarding the question
Thomas asked - are these actually bugs, or are they OK for some
reason? We shouldn't just plaster the code with spinlock acquisitions
unless it's really necessary. It seems at least possible that these
changes could cause performance regressions, and it doesn't look like
you've done any testing or provided any analysis indicating whether
that's likely to happen or not.

Now taking a step back on the patch, v3 that I sent 3 weeks ago.
Additional spinlocks are taken in:
1) SyncRepReleaseWaiters(), which is called in a WAL sender. So indeed
no spinlocks are needed here. The last patch is incorrect here.
2) SyncRepGetSyncStandbysPriority() and SyncRepGetSyncStandbysQuorum()
as of HEAD via SyncRepGetSyncStandbys(), something that's not good if
done lock-less as this gets used for pg_stat_replication for
pg_stat_get_wal_senders(). This can return a garbage list of sync
standbys to the client as WAL senders update their flush, write and
apply positions in parallel to that, and don't care about SyncRepLock
as this gets calculated with the standby feedback messages. The
problem here is the gap between walsnd->sync_standby_priority and
walsnd->flush:
-- With a primary, and more or more sync standbys, take a checkpoint
in SyncRepGetSyncStandbysPriority() after the first check is passed.
-- Shutdown the standby and restart it.
-- The standby reconnects, initializes the WAL sender slot with
InitWalSenderSlot().
-- Let the process of SyncRepGetSyncStandbysPriority() continue, the
sync standby does not show up.
That would be really unlikely to happen, but the code is written in
such a way that it could happen. One could also say that this gives a
closer idea that the standby disconnected but it looks wrong to me to
do this value lookup without a required lock.
3) In walreceiver.c's pg_stat_get_wal_receiver's:
- Launch pg_stat_get_wal_receiver and take a checkpoint on it.
- Pass the lookups of pid and ready_to_display.
- Make the WAL receiver stop.
- The view reports inconsistent data. If the WAL receiver data was
just initialized, the caller would get back PID values or similar 0.
Particularly a WAL receiver marked with !ready_to_display could show
data to the caller. That's not cool.
4) KeepFileRestoredFromArchive() needs a lock, as this is called from
the startup process. That's harmless, the worst that can happen is
that needreload is switched to true for a process that has been marked
with a PID of 0 because of a WAL sender restart (slot taken again and
initialized). But that will just process an extra reload in a worst
case.

Basically, I don't think this patch is ready to commit. We have
neither evidence that it is necessary nor evidence that it is
complete. I don't want to be unfair, here, but it seems to me you
ought to do a little more legwork before throwing this over the wall
to the pool of committers.

Fair enough. Is the analysis written above more convincing?
--
Michael

Attachments:

walsnd-pid-races-v4.patchapplication/octet-stream; name=walsnd-pid-races-v4.patchDownload+59-29
#21Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#20)
#22Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#21)
#23Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#20)
#24Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#22)
#25Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#24)
#26Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Noah Misch (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#27)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#20)
#30Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#29)
#31Noah Misch
noah@leadboat.com
In reply to: Alvaro Herrera (#29)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Noah Misch (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#30)
#34Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#33)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#34)
#36Noah Misch
noah@leadboat.com
In reply to: Alvaro Herrera (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#36)