Problem with pg_notify / listen
Hi.
After applying the latest patch we have encountered a problem with the pg_notify queue.
The queue is filling up and starts issuing warnings like
WARNING: NOTIFY queue is 87% full
DETAIL: The server process with PID 2969993 is among those with the oldest transactions.
NOTIFY queue cannot be emptied until that process ends its current transaction.
There is no long-running 'active' or 'idle in transaction' transactions on the server.
We restarted the application with the said pid but the result was that the new process pid appeared in the logs instead.
We tried to stop everything except postgresql it self but the queue was not purged. After restarting postgresql the queue is empty and seems to be holding up for an hour or so before it starts growing again.
We have been using pg_notify / listen for about three years and haven’t run in to this problem before so it seems to me that something happened in the latest patch.
PG version: 11.10 with slony. The notify triggers is on the master node.
KR.
Mikael Gustavsson
SMHI
Swedish Meteorological and Hydrological Institute
On Friday, November 27, 2020, Gustavsson Mikael <mikael.gustavsson@smhi.se>
wrote:
Hi.
After applying the latest patch we have encountered a problem with the
pg_notify queue.The queue is filling up and starts issuing warnings like
WARNING: NOTIFY queue is 87% full
DETAIL: The server process with PID 2969993 is among those with the
oldest transactions.
NOTIFY queue cannot be emptied until that process ends its current
transaction.There is no long-running 'active' or 'idle in transaction' transactions on
the server.
As its easy enough to run can you please confirm this by showing us the
pg_stat_activity record for that pid during the time when this warning
appears?
Thanks!
David J.
Hi David,
Thanks for fast reply!
We had to restart the server to avoid problems with a full queue så process is no longer with us.
The queue is at 25% now and we have minimised the payload to avoid problems during the weekend.
# select * from pg_notification_queue_usage()
;
pg_notification_queue_usage
-----------------------------
0.245367050170898
(1 row)
If my calculations is correct we will reach 50% on monday.
kr
/Mikael Gustavsson
________________________________
Från: David G. Johnston <david.g.johnston@gmail.com>
Skickat: den 27 november 2020 15:28
Till: Gustavsson Mikael
Kopia: pgsql-generallists.postgresql.org; Svensson Peter; Almen Anders
Ämne: Re: Problem with pg_notify / listen
On Friday, November 27, 2020, Gustavsson Mikael <mikael.gustavsson@smhi.se<mailto:mikael.gustavsson@smhi.se>> wrote:
Hi.
After applying the latest patch we have encountered a problem with the pg_notify queue.
The queue is filling up and starts issuing warnings like
WARNING: NOTIFY queue is 87% full
DETAIL: The server process with PID 2969993 is among those with the oldest transactions.
NOTIFY queue cannot be emptied until that process ends its current transaction.
There is no long-running 'active' or 'idle in transaction' transactions on the server.
As its easy enough to run can you please confirm this by showing us the pg_stat_activity record for that pid during the time when this warning appears?
Thanks!
David J.
Gustavsson Mikael <mikael.gustavsson@smhi.se> writes:
After applying the latest patch we have encountered a problem with the pg_notify queue.
What do you mean by "the latest patch", exactly?
regards, tom lane
Hi.
We installed PG 11.10 last week. So the latest release of PG 11.
KR
Mikael Gustavsson
________________________________
Från: Tom Lane <tgl@sss.pgh.pa.us>
Skickat: den 27 november 2020 16:21:33
Till: Gustavsson Mikael
Kopia: pgsql-generallists.postgresql.org; Svensson Peter; Almen Anders
Ämne: Re: Problem with pg_notify / listen
Gustavsson Mikael <mikael.gustavsson@smhi.se> writes:
After applying the latest patch we have encountered a problem with the pg_notify queue.
What do you mean by "the latest patch", exactly?
regards, tom lane
Clarification, we upgraded from PG 11.9 to PG 11.10.
KR Mikael Gustavsson
________________________________
Från: externaly-forwarded@smhi.se <externaly-forwarded@smhi.se> för Gustavsson Mikael <mikael.gustavsson@smhi.se>
Skickat: den 27 november 2020 16:24:38
Till: Tom Lane
Kopia: pgsql-generallists.postgresql.org; Svensson Peter; Almen Anders
Ämne: SV: Problem with pg_notify / listen
Hi.
We installed PG 11.10 last week. So the latest release of PG 11.
KR
Mikael Gustavsson
________________________________
Från: Tom Lane <tgl@sss.pgh.pa.us>
Skickat: den 27 november 2020 16:21:33
Till: Gustavsson Mikael
Kopia: pgsql-generallists.postgresql.org; Svensson Peter; Almen Anders
Ämne: Re: Problem with pg_notify / listen
Gustavsson Mikael <mikael.gustavsson@smhi.se> writes:
After applying the latest patch we have encountered a problem with the pg_notify queue.
What do you mean by "the latest patch", exactly?
regards, tom lane
[ redirecting to pgsql-bugs ]
Gustavsson Mikael <mikael.gustavsson@smhi.se> writes:
Clarification, we upgraded from PG 11.9 to PG 11.10.
Hmm ... so the only commit that touched async.c in that interval was
d4031d784. That changed asyncQueueAdvanceTail() so that, rather
than updating QUEUE_TAIL atomically, it would first compute the new
tail pointer, then release AsyncQueueLock for awhile, then finally
update QUEUE_TAIL with the previously-computed value.
I think that's all right in HEAD and v13, because of their limited
usage of the QUEUE_TAIL pointer. But I now realize that it's *not*
all right in older branches, because in those branches
asyncQueueAdvanceTail is only called in one of two ways:
1. If ProcessCompletedNotifies realizes that nobody is there to
read the notification it just sent, it'll call asyncQueueAdvanceTail.
In a system actively using notifications, that likely won't ever
happen.
2. Otherwise, asyncQueueAdvanceTail is called from asyncQueueUnregister
or asyncQueueReadAllNotifications, but only if the backend believes
itself to be the hindmost backend, ie it saw that QUEUE_TAIL was
equal to its own queue pointer.
Thus, we now have the possibility for the following race condition:
1. Backend A performs asyncQueueReadAllNotifications and thence
asyncQueueAdvanceTail (hence, A was the hindmost to start with).
2. A computes a new tail pointer, which no longer matches its own
queue pointer, but does match backend B's (ie, B is currently
hindmost). A now releases AsyncQueueLock, allowing a small
interval in which ...
3. B performs asyncQueueReadAllNotifications. It sees that its
queue pointer does not match QUEUE_TAIL, so it doesn't need to
call asyncQueueAdvanceTail. But it does advance its own pointer.
4. A re-acquires AsyncQueueLock and updates QUEUE_TAIL with what
is now a stale value that matches no backend's queue pointer.
5. After this, no execution of asyncQueueUnregister or
asyncQueueReadAllNotifications will call asyncQueueAdvanceTail,
so unless we get to a case where a notify is sent with no
listeners to hear it, the queue will never be emptied. Ooops.
So the question is what to do about this. We could dodge the
problem by back-patching 51004c717, but that's a considerably
larger change than I want to stick into the older branches.
More practical possibilities seem to include:
* Change the code back to being atomic, ie go ahead and update
QUEUE_TAIL immediately, and truncate the SLRU only afterward.
Why is it necessary, or even safe, to perform the SLRU truncation
before changing QUEUE_TAIL? (IOW, I wonder if there isn't a bug
there in HEAD too.)
* Revert d4031d784's effects in async.c in the pre-v13 branches,
on the grounds that the cure was worse than the disease.
* Change asyncQueueAdvanceTail so that it does a whole fresh
computation of the tail pointer after it re-acquires the lock.
I think this is OK; it might mean that we miss an opportunity
to truncate the SLRU, but there'll be another one.
* Find another pathway in which to call asyncQueueAdvanceTail
occasionally. While the obvious place would be "if we're
about to complain about the queue being full", that's probably
not good enough, because it'd mean that the queue grows quite
large before we recover from the race condition --- and a very
stale tail pointer has bad implications for the cost of
Exec_ListenPreCommit. I'm not sure about good places otherwise.
Thoughts?
regards, tom lane
I wrote:
* Change the code back to being atomic, ie go ahead and update
QUEUE_TAIL immediately, and truncate the SLRU only afterward.
Why is it necessary, or even safe, to perform the SLRU truncation
before changing QUEUE_TAIL? (IOW, I wonder if there isn't a bug
there in HEAD too.)
After thinking more about that, I'm pretty sure there is a bug there:
a newly-arriving backend could attempt to scan the queue starting at
QUEUE_TAIL, concurrently with SimpleLruTruncate removing the page(s)
it wants to scan. In typical cases no failure will occur because
Exec_ListenPreCommit could advance its queue pointer to a safe place
by observing the pointers of other backends. However, if the new
listener were the only one in its database, we'd have trouble.
What seems like the most expedient way to fix this is to separate
QUEUE_TAIL into two variables, one that is the "logical" tail
position from which new backends may start to scan the queue,
and one that is the "physical" tail position, ie, the oldest
page we have not yet truncated. The physical tail need only be
tracked to page accuracy, so it can be a plain int not a QUEUE_POS.
asyncQueueAdvanceTail should update QUEUE_TAIL immediately, which
restores correct behavior pre-v13 and also dodges the race condition
described above. But we don't update the physical tail pointer
until we've completed SimpleLruTruncate, keeping things honest
with respect to asyncQueueIsFull.
As a minor side benefit, asyncQueueAdvanceTail doesn't have to take
the NotifyQueueLock twice unless it actually does an SLRU truncation.
In short, I propose the attached patch (against HEAD, but the
same logic changes should work in the back branches).
regards, tom lane
Attachments:
fix-notify-race-conditions.patchtext/x-diff; charset=us-ascii; name=fix-notify-race-conditions.patchDownload+35-14
On Fri, Nov 27, 2020 at 05:13:28PM -0500, Tom Lane wrote:
I wrote:
* Change the code back to being atomic, ie go ahead and update
QUEUE_TAIL immediately, and truncate the SLRU only afterward.
Why is it necessary, or even safe, to perform the SLRU truncation
before changing QUEUE_TAIL? (IOW, I wonder if there isn't a bug
there in HEAD too.)
Commit d4031d7 added this comment about necessity:
+ * ... Mutual exclusion must end after any limit
+ * update that would permit other backends to write fresh data into the
+ * segment immediately preceding the one containing cutoffPage. Otherwise,
+ * when the SLRU is quite full, SimpleLruTruncate() might delete that segment
+ * after it has accrued freshly-written data.
... but ...
After thinking more about that, I'm pretty sure there is a bug there:
a newly-arriving backend could attempt to scan the queue starting at
QUEUE_TAIL, concurrently with SimpleLruTruncate removing the page(s)
it wants to scan. In typical cases no failure will occur because
Exec_ListenPreCommit could advance its queue pointer to a safe place
by observing the pointers of other backends. However, if the new
listener were the only one in its database, we'd have trouble.
... agreed. In general, recycling SLRU space entails three steps that shall
not overlap:
1. Stop reading data in the space, regulated by some "logical tail".
2. Unlink files wholly within the bounds of the space.
3. Start writing data into the space, regulated by some "physical tail" (most
often called a "stop limit").
Commit d4031d7 fixed overlap of (2) and (3). For pg_notify, though, it
introduced overlap of (1) and (2). I've now checked the other SLRUs for
similar problems, but I found nothing urgent:
- pg_xact and pg_subtrans have oldestClogXid as their logical tail and
xidStopLimit as their physical tail. All good.
- pg_multixact/offsets and pg_multixact/members store no logical tails. They
have multiStopLimit and offsetStopLimit as physical tails. Under the right
race condition, pg_get_multixact_members() could get a low-level failure.
That's an undocumented debug function, so letting it fail that way is fine.
- pg_serial has tailXid as its logical tail. It stores no physical tail.
This causes some nearly-impossible bugs, discussed in the comments that
/messages/by-id/20201109045319.GA459206@rfd.leadboat.com adds to
predicate.c.
@@ -286,6 +288,7 @@ static AsyncQueueControl *asyncQueueControl;
#define QUEUE_HEAD (asyncQueueControl->head)
#define QUEUE_TAIL (asyncQueueControl->tail)
+#define QUEUE_TAIL_PAGE (asyncQueueControl->tailPage)
I think we don't yet have the right name here, seeing QUEUE_TAIL_PAGE !=
QUEUE_POS_PAGE(QUEUE_TAIL) sounds paradoxical, yet happens regularly. How
about naming it QUEUE_STOP_PAGE?
Otherwise, this looks good. Thanks for diagnosing and fixing this defect.
Noah Misch <noah@leadboat.com> writes:
... agreed. In general, recycling SLRU space entails three steps that shall
not overlap:
1. Stop reading data in the space, regulated by some "logical tail".
2. Unlink files wholly within the bounds of the space.
3. Start writing data into the space, regulated by some "physical tail" (most
often called a "stop limit").
Check.
Commit d4031d7 fixed overlap of (2) and (3). For pg_notify, though, it
introduced overlap of (1) and (2). I've now checked the other SLRUs for
similar problems, but I found nothing urgent:
Good, I was wondering if we had any similar issues elsewhere.
I think we don't yet have the right name here, seeing QUEUE_TAIL_PAGE !=
QUEUE_POS_PAGE(QUEUE_TAIL) sounds paradoxical, yet happens regularly. How
about naming it QUEUE_STOP_PAGE?
Hmm, it's not very clear what "stop" means here. What do you think of
QUEUE_OLDEST_PAGE?
regards, tom lane
On Fri, Nov 27, 2020 at 11:03:40PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I think we don't yet have the right name here, seeing QUEUE_TAIL_PAGE !=
QUEUE_POS_PAGE(QUEUE_TAIL) sounds paradoxical, yet happens regularly. How
about naming it QUEUE_STOP_PAGE?Hmm, it's not very clear what "stop" means here. What do you think of
QUEUE_OLDEST_PAGE?
"STOP" would mean the same kind of thing it means in xidStopLimit,
multiStopLimit and offsetStopLimit. Interpreted for pg_notify specifically,
it would mean "if queueing a notification would require use of this page,
throw an error."
QUEUE_OLDEST_PAGE is fine. I like it a little less than QUEUE_STOP_PAGE,
because oldestClogXid is a logical tail, and QUEUE_OLDEST_PAGE would be a
physical tail.
Noah Misch <noah@leadboat.com> writes:
On Fri, Nov 27, 2020 at 11:03:40PM -0500, Tom Lane wrote:
Hmm, it's not very clear what "stop" means here. What do you think of
QUEUE_OLDEST_PAGE?
QUEUE_OLDEST_PAGE is fine. I like it a little less than QUEUE_STOP_PAGE,
because oldestClogXid is a logical tail, and QUEUE_OLDEST_PAGE would be a
physical tail.
I went with QUEUE_STOP_PAGE.
In further testing, I noted that the patch as I had it re-introduced the
symptom that 8b7ae5a82 fixed, that running "make installcheck" twice
in a row causes the async-notify isolation test to fail. That's because
I'd changed asyncQueueUsage() to measure the distance back to the physical
tail, which isn't stable because we only truncate after crossing an
SLRU segment boundary. So I reverted it to measuring the distance to
QUEUE_TAIL. You could argue either way about which definition is more
useful to end users, perhaps; but in practice the difference should
usually be too small to matter, for everyone except regression tests
that are looking for distance exactly zero.
Anyhow, pushed with that fix.
Mikael, it appears that you have three options: revert to 11.9 until 11.11
is out, restart your server every time it approaches notify-queue-full, or
apply this patch:
regards, tom lane
Great!
I will talk to the rest of the team on which path to take.
And thank you for finding and fixing this so quickly!
kr
Mikael Gustavsson
SMHI
________________________________
Från: Tom Lane <tgl@sss.pgh.pa.us>
Skickat: den 28 november 2020 20:15
Till: Noah Misch
Kopia: Gustavsson Mikael; pgsql-bugs@lists.postgresql.org; Svensson Peter; Almen Anders
Ämne: Re: SV: Problem with pg_notify / listen
Noah Misch <noah@leadboat.com> writes:
On Fri, Nov 27, 2020 at 11:03:40PM -0500, Tom Lane wrote:
Hmm, it's not very clear what "stop" means here. What do you think of
QUEUE_OLDEST_PAGE?
QUEUE_OLDEST_PAGE is fine. I like it a little less than QUEUE_STOP_PAGE,
because oldestClogXid is a logical tail, and QUEUE_OLDEST_PAGE would be a
physical tail.
I went with QUEUE_STOP_PAGE.
In further testing, I noted that the patch as I had it re-introduced the
symptom that 8b7ae5a82 fixed, that running "make installcheck" twice
in a row causes the async-notify isolation test to fail. That's because
I'd changed asyncQueueUsage() to measure the distance back to the physical
tail, which isn't stable because we only truncate after crossing an
SLRU segment boundary. So I reverted it to measuring the distance to
QUEUE_TAIL. You could argue either way about which definition is more
useful to end users, perhaps; but in practice the difference should
usually be too small to matter, for everyone except regression tests
that are looking for distance exactly zero.
Anyhow, pushed with that fix.
Mikael, it appears that you have three options: revert to 11.9 until 11.11
is out, restart your server every time it approaches notify-queue-full, or
apply this patch:
regards, tom lane
Hi,
I have a question. Is this problem in postgres 12.5 ? We have very similar
problem after upgrade from 12.4 to 12.5: queue size grows to 15% in 20
hours and still is raising.
On Sun, Feb 7, 2021 at 2:15 AM Gustavsson Mikael <mikael.gustavsson@smhi.se>
wrote:
Great!
I will talk to the rest of the team on which path to take.
And thank you for finding and fixing this so quickly!
kr
Mikael Gustavsson
SMHI
------------------------------
*Från:* Tom Lane <tgl@sss.pgh.pa.us>
*Skickat:* den 28 november 2020 20:15
*Till:* Noah Misch
*Kopia:* Gustavsson Mikael; pgsql-bugs@lists.postgresql.org; Svensson
Peter; Almen Anders
*Ämne:* Re: SV: Problem with pg_notify / listenNoah Misch <noah@leadboat.com> writes:
On Fri, Nov 27, 2020 at 11:03:40PM -0500, Tom Lane wrote:
Hmm, it's not very clear what "stop" means here. What do you think of
QUEUE_OLDEST_PAGE?QUEUE_OLDEST_PAGE is fine. I like it a little less than QUEUE_STOP_PAGE,
because oldestClogXid is a logical tail, and QUEUE_OLDEST_PAGE would be a
physical tail.I went with QUEUE_STOP_PAGE.
In further testing, I noted that the patch as I had it re-introduced the
symptom that 8b7ae5a82 fixed, that running "make installcheck" twice
in a row causes the async-notify isolation test to fail. That's because
I'd changed asyncQueueUsage() to measure the distance back to the physical
tail, which isn't stable because we only truncate after crossing an
SLRU segment boundary. So I reverted it to measuring the distance to
QUEUE_TAIL. You could argue either way about which definition is more
useful to end users, perhaps; but in practice the difference should
usually be too small to matter, for everyone except regression tests
that are looking for distance exactly zero.Anyhow, pushed with that fix.
Mikael, it appears that you have three options: revert to 11.9 until 11.11
is out, restart your server every time it approaches notify-queue-full, or
apply this patch:regards, tom lane
--
Pozdrawiam
Piotr Włodarczyk
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:
I have a question. Is this problem in postgres 12.5 ?
Yes.
regards, tom lane
Ok. And will by fixed in 12.6?
On Sun, 7 Feb 2021, 2:28 am Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Show quoted text
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:
I have a question. Is this problem in postgres 12.5 ?
Yes.
regards, tom lane