NOTIFY queue is at 66% and climbing...
Hi all,
On 10.15 I'm getting the following on a logically replicated server.
From the CSV logs:
2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16
18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
2021-10-13 18:49:46.058 EDT,,,213601,,6143c257.34261,64244,,2021-09-16
18:16:55 EDT,4/3914855,60709905,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
2021-10-13 18:49:51.934 EDT,,,213601,,6143c257.34261,64245,,2021-09-16
18:16:55 EDT,4/3914862,60709915,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
2021-10-13 18:50:00.516 EDT,,,213601,,6143c257.34261,64246,,2021-09-16
18:16:55 EDT,4/3914864,60709917,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
2021-10-13 18:50:08.003 EDT,,,213601,,6143c257.34261,64247,,2021-09-16
18:16:55 EDT,4/3914871,60709926,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
The very fine docs for 10 say
"There is a queue that holds notifications that have been sent but not
yet processed by all listening sessions. If this queue becomes full,
transactions calling |NOTIFY| will fail at commit. The queue is quite
large (8GB in a standard installation) and should be sufficiently sized
for almost every use case. However, no cleanup can take place if a
session executes |LISTEN| and then enters a transaction for a very long
time. Once the queue is half full you will see warnings in the log file
pointing you to the session that is preventing cleanup. In this case you
should make sure that this session ends its current transaction so that
cleanup can proceed."
We do have a client that has access to this logically replicated server
and I have no idea what they have done regards LISTEN/NOTIFY, and my
query to them about this has not yielded a response.
In the CSV logs above what part points to "the session that is
preventing cleanup" so that I can kill it? pg_stat_activity yields no
clues.
Thanks,
Jeff
Jeff Ross <jross@openvistas.net> writes:
On 10.15 I'm getting the following on a logically replicated server.
2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16
18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
In the CSV logs above what part points to "the session that is
preventing cleanup" so that I can kill it?
Normally there's a DETAIL entry citing the session's PID. Looking
at the code, the reason for the lack of any such entry must be that
there is no session whose current notify queue position exactly
matches the supposed global minimum position. This corresponds to
a known bug that was fixed in 10.16, so I'd suggest upgrading.
As a temporary workaround you could restart that server, but
likely the problem would recur after awhile.
regards, tom lane
On 10/13/21 5:50 PM, Tom Lane wrote:
Jeff Ross <jross@openvistas.net> writes:
On 10.15 I'm getting the following on a logically replicated server.
2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16
18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66%
full",,,,,,,,,""
In the CSV logs above what part points to "the session that is
preventing cleanup" so that I can kill it?Normally there's a DETAIL entry citing the session's PID. Looking
at the code, the reason for the lack of any such entry must be that
there is no session whose current notify queue position exactly
matches the supposed global minimum position. This corresponds to
a known bug that was fixed in 10.16, so I'd suggest upgrading.
As a temporary workaround you could restart that server, but
likely the problem would recur after awhile.regards, tom lane
Thanks as always, Tom. I'll schedule the upgrade to 10.18 and restart
for this weekend.
Jeff