LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Hi,
I'm bringing up a bug that was reported multiple times [1]/messages/by-id/18804-bccbbde5e77a68c2@postgresql.org[2]/messages/by-id/16961-25f29f95b3604a8a@postgresql.org[3]/messages/by-id/18394-e7459245148578b2@postgresql.org in
the bugs list here, for a broader audience.
The issue is that an ERROR like the one below occurs when trying to
register any listener in the database.
test=# listen c21;
ERROR: 58P01: could not access status of transaction 14279685
DETAIL: Could not open file "pg_xact/000D": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
In [1]/messages/by-id/18804-bccbbde5e77a68c2@postgresql.org, Andrei Varashen provided detailed reproduction steps. I’m
copying and pasting his example from thread [1]/messages/by-id/18804-bccbbde5e77a68c2@postgresql.org here, with slight
simplification.
Pre-conditions:
- Disable autovacuum to avoid its intervention.
Steps to reproduce:
1. Create a test table and notify (but not listen) to a channel
(backend 1):
create table test (id int);
insert into test values (1);
notify c1;
2. List pg_xact files so we know its starting state:
➜ ls -lah ~/pg-devel/data/pg_xact
total 16
drwx------@ 3 alex.wang staff 96B Aug 5 20:10 .
drwx------@ 26 alex.wang staff 832B Aug 5 20:12 ..
-rw-------@ 1 alex.wang staff 8.0K Aug 5 20:12 0000
3. Prepare a "test.sql" file for pgbench, and then run pgbench to
generate lots of transactions, so that pg_xact/0000 is completely
filled and leads to pg_xact/0001.
cat test.sql
UPDATE test SET id = 1;
pgbench -n -c 80 -j 10 -t 15000 -f ~/workspace/test.sql postgres
4. Verify that pg_xact/0001 is created:
➜ ls -lah ~/pg-devel/data/pg_xact
total 560
drwx------@ 4 alex.wang staff 128B Aug 5 20:25 .
drwx------@ 26 alex.wang staff 832B Aug 5 20:12 ..
-rw-------@ 1 alex.wang staff 256K Aug 5 20:25 0000
-rw-------@ 1 alex.wang staff 16K Aug 5 20:25 0001
5. Execute VACUUM FREEZE on every database on the server to freeze
rows and purge pg_xact/0000.
postgres=# VACUUM FREEZE;
VACUUM
postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM
template1=# ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
ALTER DATABASE
template1=# \c template0
You are now connected to database "template0" as user "postgres".
template0=# VACUUM FREEZE;
VACUUM
postgres=# select datname, datfrozenxid from pg_database;
datname | datfrozenxid
-----------+--------------
postgres | 1200774
template0 | 1200775
template1 | 1200774
(3 rows)
6. Ensure that pg_xact/0000 is gone:
➜ ls -lah ~/pg-devel/data/pg_xact
total 80
drwx------@ 3 alex.wang staff 96B Aug 5 20:29 .
drwx------@ 26 alex.wang staff 832B Aug 5 20:12 ..
-rw-------@ 1 alex.wang staff 40K Aug 5 20:30 0001
7. Try to listen to any channel from any backend connection on the
same database:
postgres=# listen c1;
ERROR: 58P01: could not access status of transaction 773
DETAIL: Could not open file "pg_xact/0000": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
postgres=# listen c2;
ERROR: 58P01: could not access status of transaction 773
DETAIL: Could not open file "pg_xact/0000": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
Here's the stack trace from the master branch:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 21.1
* frame #0: 0x0000000102c798c8
postgres`SlruReportIOError(ctl=0x000000010378e5e0, pageno=0, xid=773) at
slru.c:1084:4
frame #1: 0x0000000102c792b0
postgres`SimpleLruReadPage(ctl=0x000000010378e5e0, pageno=0, write_ok=true,
xid=773) at slru.c:603:4
frame #2: 0x0000000102c79f7c
postgres`SimpleLruReadPage_ReadOnly(ctl=0x000000010378e5e0, pageno=0,
xid=773) at slru.c:661:9
frame #3: 0x0000000102c6a6bc postgres`TransactionIdGetStatus(xid=773,
lsn=0x000000016d2975e8) at clog.c:745:11
frame #4: 0x0000000102c7e924
postgres`TransactionLogFetch(transactionId=773) at transam.c:79:14
frame #5: 0x0000000102c7e74c
postgres`TransactionIdDidCommit(transactionId=773) at transam.c:130:14
frame #6: 0x0000000102dc8b94
postgres`asyncQueueProcessPageEntries(current=0x000000016d297720,
stop=QueuePosition @ 0x000000016d297690, page_buffer="\U00000014",
snapshot=0x000000012d812398) at async.c:2069:13
frame #7: 0x0000000102dc8954 postgres`asyncQueueReadAllNotifications at
async.c:1981:18
frame #8: 0x0000000102dc6b5c postgres`Exec_ListenPreCommit at
async.c:1127:3
frame #9: 0x0000000102dc664c postgres`PreCommit_Notify at async.c:881:6
frame #10: 0x0000000102c8c77c postgres`CommitTransaction at
xact.c:2341:2
frame #11: 0x0000000102c87b2c postgres`CommitTransactionCommandInternal
at xact.c:3214:4
frame #12: 0x0000000102c87a44 postgres`CommitTransactionCommand at
xact.c:3175:10
frame #13: 0x000000010321da94 postgres`finish_xact_command at
postgres.c:2833:3
frame #14: 0x000000010321b64c
postgres`exec_simple_query(query_string="listen c1;") at postgres.c:1298:4
frame #15: 0x000000010321a714 postgres`PostgresMain(dbname="postgres",
username="alex.wang") at postgres.c:4767:7
frame #16: 0x0000000103211a14
postgres`BackendMain(startup_data=0x000000016d299e48, startup_data_len=24)
at backend_startup.c:124:2
frame #17: 0x00000001030e938c
postgres`postmaster_child_launch(child_type=B_BACKEND, child_slot=56,
startup_data=0x000000016d299e48, startup_data_len=24,
client_sock=0x000000016d299ed8) at launch_backend.c:290:3
frame #18: 0x00000001030f0d60
postgres`BackendStartup(client_sock=0x000000016d299ed8) at
postmaster.c:3587:8
frame #19: 0x00000001030eebc4 postgres`ServerLoop at postmaster.c:1702:6
frame #20: 0x00000001030ed67c postgres`PostmasterMain(argc=3,
argv=0x00006000021f14e0) at postmaster.c:1400:11
frame #21: 0x0000000102f73e40 postgres`main(argc=3,
argv=0x00006000021f14e0) at main.c:231:4
frame #22: 0x00000001940a2b98 dyld`start + 6076
Daniil Davydov has analyzed the root cause in thread [4]/messages/by-id/CAJDiXgh3Jh2N90Fe4=X2qE+YAiZ1BSpgznhH+vkTHmvK3gjqxw@mail.gmail.com and I agree with
what he said:
On Thu, Jul 31, 2025 at 8:21 PM Daniil Davydov <3danissimo@gmail.com> wrote:
We have the following logic in the notify queue :
If there are no listeners within all databases, and we are calling
LISTEN, then we must iterate from 'tail' to 'head' of the queue and
check statuses of transactions (see Exec_ListenPreCommit).
If there is a pruned-away xid in the queue, we will try to access its
status and get an error.Because the tail of the queue is not necessarily always advanced
forward by the listeners, we can get such error without any long lived
transactions.
The fix and workarounds were discussed in [5]/messages/by-id/CAJDiXgj1BmLKAZ=gOC1eETmctt7z=aj8MHfmd+nORE1P6qncsA@mail.gmail.com and [6]/messages/by-id/CAK98qZ0uZrf1b6L4mDs+17M07KCNt6YMc9vD-+TmeWaJHRbGpA@mail.gmail.com: In [5]/messages/by-id/CAJDiXgj1BmLKAZ=gOC1eETmctt7z=aj8MHfmd+nORE1P6qncsA@mail.gmail.com, Daniil
proposed a patch, which I’ve attached. The patch adds a call to
asyncQueueAdvanceTail() in vac_update_datfrozenxid(), so that VACUUM
advances the async queue tail.
Similarly, calling the built-in function pg_notification_queue_usage()
also advances the async queue tail. So when the issue occurs, calling
this function could also make the error go away. However, this doesn’t
prevent the error from happening in the first place.
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.
2. If the answer to 1 is no, how can we teach VACUUM to respect the
minimum xid stored in all AsyncQueueEntries?
Best,
Alex
[1]: /messages/by-id/18804-bccbbde5e77a68c2@postgresql.org
/messages/by-id/18804-bccbbde5e77a68c2@postgresql.org
[2]: /messages/by-id/16961-25f29f95b3604a8a@postgresql.org
/messages/by-id/16961-25f29f95b3604a8a@postgresql.org
[3]: /messages/by-id/18394-e7459245148578b2@postgresql.org
/messages/by-id/18394-e7459245148578b2@postgresql.org
[4]: /messages/by-id/CAJDiXgh3Jh2N90Fe4=X2qE+YAiZ1BSpgznhH+vkTHmvK3gjqxw@mail.gmail.com
/messages/by-id/CAJDiXgh3Jh2N90Fe4=X2qE+YAiZ1BSpgznhH+vkTHmvK3gjqxw@mail.gmail.com
[5]: /messages/by-id/CAJDiXgj1BmLKAZ=gOC1eETmctt7z=aj8MHfmd+nORE1P6qncsA@mail.gmail.com
/messages/by-id/CAJDiXgj1BmLKAZ=gOC1eETmctt7z=aj8MHfmd+nORE1P6qncsA@mail.gmail.com
[6]: /messages/by-id/CAK98qZ0uZrf1b6L4mDs+17M07KCNt6YMc9vD-+TmeWaJHRbGpA@mail.gmail.com
/messages/by-id/CAK98qZ0uZrf1b6L4mDs+17M07KCNt6YMc9vD-+TmeWaJHRbGpA@mail.gmail.com
Attachments:
0001-Advance-tail-of-async-queue-before-updating-datfroze.patch.no-cfbotapplication/octet-stream; name=0001-Advance-tail-of-async-queue-before-updating-datfroze.patch.no-cfbotDownload+16-6
On 2025-Aug-05, Alexandra Wang wrote:
I'm bringing up a bug that was reported multiple times [1][2][3] in
the bugs list here, for a broader audience.The issue is that an ERROR like the one below occurs when trying to
register any listener in the database.test=# listen c21;
ERROR: 58P01: could not access status of transaction 14279685
DETAIL: Could not open file "pg_xact/000D": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
Oh, interesting problem. Many thanks for the excellent write-up.
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.
I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?
2. If the answer to 1 is no, how can we teach VACUUM to respect the
minimum xid stored in all AsyncQueueEntries?
Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
listeners, and back off the pg_clog truncation based on that. This
could be done by having a new boolean argument that says to look up the
XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
would only be passed true by vac_update_datfrozenxid(), to avoid
overhead by other callers), then collect the oldest of those and return
it.
This does create the problem that an inactive listener could cause the
XID counter to stay far in the past. Maybe we could try to avoid this
by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
are way overdue on reading notifies. I'm not sure if this is really
needed or useful; consider a backend stuck on SIGSTOP (debugger or
whatever): it will just sit there forever.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?
I'm bit worry about this too.
2. If the answer to 1 is no, how can we teach VACUUM to respect the
minimum xid stored in all AsyncQueueEntries?Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
listeners, and back off the pg_clog truncation based on that. This
could be done by having a new boolean argument that says to look up the
XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
would only be passed true by vac_update_datfrozenxid(), to avoid
overhead by other callers), then collect the oldest of those and return
it.
The problem with only considering the oldest XID of listeners is that
IIUC we may have notifications without listeners, and in this case we
may still get this error because when the LISTEN is executed we loop
through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
call TransactionIdDidCommit() that raise the error before
IsListeningOn(channel) is called.
Another option would be to add a minXid field on AsyncQueueControl and
then update this value on asyncQueueProcessPageEntries() and
asyncQueueAddEntries() routines, and then we could check this value on
vac_update_datfrozenxid().
This does create the problem that an inactive listener could cause the
XID counter to stay far in the past. Maybe we could try to avoid this
by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
are way overdue on reading notifies. I'm not sure if this is really
needed or useful; consider a backend stuck on SIGSTOP (debugger or
whatever): it will just sit there forever.
With this idea that I've proposed we still could have this problem, if a
listener take too long to consume a message we would block vacuum freeze
to advance the xid. For this I think that we could have two GUC's; One
to enable and disable the oldest xmin check on async queue and the
second to control how far we want to prevent the vacuum from freezing
the oldest async queue xid, and if the min xid raises this limit we
ignore and truncate the xid.
I've write a draft patch that plays with the idea, see attached.
--
Matheus Alcantara
Attachments:
v0-0001-Consider-async-queue-min-xid-on-VACUUM-FREEZE.patchtext/plain; charset=utf-8; name=v0-0001-Consider-async-queue-min-xid-on-VACUUM-FREEZE.patchDownload+72-1
Hi,
On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?I'm bit worry about this too.
What exactly do we mean by "active listener"? According to the source code,
the active listener (as far as I understand) is the one who listens to at least
one channel. If we have no active listeners in the database, the new listener
will set its pointer to the tail of the async queue. Thus, messages with old
xid will not be touched by anybody. I don't see any point in dropping them
in this case.
If the "inactive" listener is the backend which is stuck somewhere, the
answer is "no" - this backend should be able to process all notifications.
2. If the answer to 1 is no, how can we teach VACUUM to respect the
minimum xid stored in all AsyncQueueEntries?Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
listeners, and back off the pg_clog truncation based on that. This
could be done by having a new boolean argument that says to look up the
XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
would only be passed true by vac_update_datfrozenxid(), to avoid
overhead by other callers), then collect the oldest of those and return
it.The problem with only considering the oldest XID of listeners is that
IIUC we may have notifications without listeners, and in this case we
may still get this error because when the LISTEN is executed we loop
through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
call TransactionIdDidCommit() that raise the error before
IsListeningOn(channel) is called.
Agree.
This does create the problem that an inactive listener could cause the
XID counter to stay far in the past. Maybe we could try to avoid this
by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
are way overdue on reading notifies. I'm not sure if this is really
needed or useful; consider a backend stuck on SIGSTOP (debugger or
whatever): it will just sit there forever.With this idea that I've proposed we still could have this problem, if a
listener take too long to consume a message we would block vacuum freeze
to advance the xid. For this I think that we could have two GUC's; One
to enable and disable the oldest xmin check on async queue and the
second to control how far we want to prevent the vacuum from freezing
the oldest async queue xid, and if the min xid raises this limit we
ignore and truncate the xid.
1) About Álvaro's comment:
I don't think that killing the lagging backend will save us, because entries
in the async queue are not ordered by xid. Thus, even after killing such a
backend, we have no guarantees that the problem is gone, because not
lagging backends still may encounter a super old xid.
2) About Matheus's comment:
I guess that there is only one reliable way to determine the precise minimal xid
in the queue (after message processing) - scan all pages from head to tail.
It obviously can take a lot of time. The second GUC (if I get it
right) logic is based
on minimal xid determination, so it won't be the best solution for
highload systems
(that are forced to turn on the first GUC, because of safety requirements).
Another option would be to add a minXid field on AsyncQueueControl and
then update this value on asyncQueueProcessPageEntries() and
asyncQueueAddEntries() routines, and then we could check this value on
vac_update_datfrozenxid().I've write a draft patch that plays with the idea, see attached.
Thanks for the patch! I have few comments on it :
1)
Maybe we should add an assert that NotifyQueueLock is held to the StoreEntryXid
function?
2)
For fields in AsyncQueueControl we have macros like QUEUE_HEAD,
QUEUE_TAIL and so on. Maybe minXid should also be wrapped with something
like QUEUE_MIN_XID?
3)
This logic will not work with (for example) two listeners. Steps :
session1=# listen c1;
session2=# listen c1;
session3=# notify c1; // sets minimal xid to N inside StoreEntryXid
session3=# notify c1;
session1=# listen c1; // sets minimal xid to (N + 1) inside ReleaseEntryXid
After these steps, session2 still needs to process a message with xid = N, but
autovacuum legally can freeze it and truncate clog.
Maybe we should check whether there are no listeners with smaller queue
position compared to the current backend?
4)
After ReleaseEntryXid, minXid field contains not actually minimal xid
in the queue,
but maximum processed xid. Maybe this parameter can be renamed?
--
Best regards,
Daniil Davydov
On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
Hi,
On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?I'm bit worry about this too.
What exactly do we mean by "active listener"? According to the source code,
the active listener (as far as I understand) is the one who listens to at least
one channel. If we have no active listeners in the database, the new listener
will set its pointer to the tail of the async queue. Thus, messages with old
xid will not be touched by anybody. I don't see any point in dropping them
in this case.
I think that this definition is correct, but IIUC the tail can still
have notifications with xid's that were already truncated by vacuum
freeze. When the LISTEN is executed, we first loop through the
notification queue to try to advance the queue pointers and we can
eventually iterate over a notification that was added on the queue
without any listener but it has a xid that is already truncated by vacuum
freeze, so in this case it will fail to get the transaction status. On
Alex steps to reproduce the issue it first executes the NOTIFY and
then executes the LISTEN which fails after vacuum freeze.
If the "inactive" listener is the backend which is stuck somewhere, the
answer is "no" - this backend should be able to process all notifications.
I tried to reproduce the issue by using some kind of "inactive"
listener but so far I didn't manage to trigger the error. This is what I
tried:
1. Create listener:
postgres=# listen c1;
2. Execute a very long query to make the backend busy to process the
notification:
postgres=# select * from generate_series(1,10000000000) g where g > 1;
3. On another session send the notification
postgres=# notify c1;
4. Execute pgbench test:
pgbench -n -c 80 -j 10 -t 15000 -f test.sql postgres
5. Verify that we have multiple files on pg_xact:
➜ ls -lah ~/pg-devel/data/pg_xact
total 608
-rw-------@ 1 matheus staff 256K Aug 18 20:36 0000
-rw-------@ 1 matheus staff 40K Aug 18 20:56 0001
6. Execute VACUUM FREEZE on every database on the server
postgres=# VACUUM FREEZE;
VACUUM
postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM
template1=# \c template0
You are now connected to database "template0" as user "postgres".
template0=# VACUUM FREEZE;
VACUUM
After the vacuum freeze I still can see the same files on pg_xact/ and
if I cancel the long query the notification is received correctly, and
then if I execute vacuum freeze again on every database the oldest
pg_xact file is truncated.
So, if my tests are correct I don't think that storing the oldest xid is
necessary anymore since I don't think that we can lose notifications
using the patch from Daniil or I'm missing something here?
Thinking about this, maybe another solution for this would be to change
queue advancing pointers to skip the transaction status check? With this
we would not need to touch any vacuum freeze code and instead change the
LISTEN execution to handle this scenario.
Thoughts?
--
Matheus Alcantara
Hi,
On Tue, Aug 19, 2025 at 7:14 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
What exactly do we mean by "active listener"? According to the source code,
the active listener (as far as I understand) is the one who listens to at least
one channel. If we have no active listeners in the database, the new listener
will set its pointer to the tail of the async queue. Thus, messages with old
xid will not be touched by anybody. I don't see any point in dropping them
in this case.I think that this definition is correct, but IIUC the tail can still
have notifications with xid's that were already truncated by vacuum
freeze. When the LISTEN is executed, we first loop through the
notification queue to try to advance the queue pointers and we can
eventually iterate over a notification that was added on the queue
without any listener but it has a xid that is already truncated by vacuum
freeze, so in this case it will fail to get the transaction status. On
Alex steps to reproduce the issue it first executes the NOTIFY and
then executes the LISTEN which fails after vacuum freeze.
Yeah, you are right. I looked at the code again, and found out that even
if there are no active listeners, new listener should iterate from the head
to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
dropping notifications is not the best way to resolve this issue.
If the "inactive" listener is the backend which is stuck somewhere, the
answer is "no" - this backend should be able to process all notifications.I tried to reproduce the issue by using some kind of "inactive"
listener but so far I didn't manage to trigger the error.After the vacuum freeze I still can see the same files on pg_xact/ and
if I cancel the long query the notification is received correctly, and
then if I execute vacuum freeze again on every database the oldest
pg_xact file is truncated.So, if my tests are correct I don't think that storing the oldest xid is
necessary anymore since I don't think that we can lose notifications
using the patch from Daniil or I'm missing something here?
You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.
--
Best regards,
Daniil Davydov
On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
I think that this definition is correct, but IIUC the tail can still
have notifications with xid's that were already truncated by vacuum
freeze. When the LISTEN is executed, we first loop through the
notification queue to try to advance the queue pointers and we can
eventually iterate over a notification that was added on the queue
without any listener but it has a xid that is already truncated by vacuum
freeze, so in this case it will fail to get the transaction status. On
Alex steps to reproduce the issue it first executes the NOTIFY and
then executes the LISTEN which fails after vacuum freeze.Yeah, you are right. I looked at the code again, and found out that even
if there are no active listeners, new listener should iterate from the head
to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
dropping notifications is not the best way to resolve this issue.
In the steps that Alex shared, is it expected that the "LISTEN c1" command
consumes the notification that was sent previously with NOTIFY? IIUC the
LISTEN command should be executed before of any NOTIFY, so executing the
LISTEN after a NOTIFY will not consume any previous notification added
on the channel, so how bad would be to drop this notification from the
queue in this situation?
If the "inactive" listener is the backend which is stuck somewhere, the
answer is "no" - this backend should be able to process all notifications.I tried to reproduce the issue by using some kind of "inactive"
listener but so far I didn't manage to trigger the error.After the vacuum freeze I still can see the same files on pg_xact/ and
if I cancel the long query the notification is received correctly, and
then if I execute vacuum freeze again on every database the oldest
pg_xact file is truncated.So, if my tests are correct I don't think that storing the oldest xid is
necessary anymore since I don't think that we can lose notifications
using the patch from Daniil or I'm missing something here?You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.
But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?
--
Matheus Alcantara
Hi,
On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?
For now, I cannot come up with a situation where it may be possible.
Perhaps, such a lagging may occur during network communication,
but I couldn't reproduce it. Maybe other people know how we can achieve
this?
I think that if such a situation may be possible, the suggestion to delete
messages will no longer be relevant. Therefore, first of all, I would like to
clarify this issue.
--
Best regards,
Daniil Davydov
On Mon, Aug 18, 2025 at 5:14 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
Hi,
On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
My questions:
1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?I'm bit worry about this too.
What exactly do we mean by "active listener"? According to the source code,
the active listener (as far as I understand) is the one who listens to at least
one channel. If we have no active listeners in the database, the new listener
will set its pointer to the tail of the async queue. Thus, messages with old
xid will not be touched by anybody. I don't see any point in dropping them
in this case.I think that this definition is correct, but IIUC the tail can still
have notifications with xid's that were already truncated by vacuum
freeze. When the LISTEN is executed, we first loop through the
notification queue to try to advance the queue pointers and we can
eventually iterate over a notification that was added on the queue
without any listener but it has a xid that is already truncated by vacuum
freeze, so in this case it will fail to get the transaction status.
When a process first executes the LISTEN command, it iterates through
the notification queue, but it seems only to advance its queue pointer
because it doesn't add the interested channels to its list yet and it
isn't interested in the notifications queued before it registered as a
listener. I'm wondering if we could optimize this by allowing the
queue pointer to fast-forward without checking transaction status. If
feasible, this might resolve the reported issue.
However, I have a more fundamental concern regarding the LISTEN/NOTIFY
implementation. Since vacuum doesn't consider the XIDs of notification
entries, there might be a critical issue with the truncation of clog
entries that LISTEN/NOTIFY still requires. As I understand it,
notification queue entries aren't ordered by XID, and it's possible
for a notification with an older XID to be positioned at the queue's
head. If vacuum freeze then truncates the corresponding clogs,
listeners attempting to retrieve this notification would fail to
obtain the transaction status. To address this, we likely need to
either implement Álvaro's suggestion[1]/messages/by-id/202508061044.ptcyt7aqsaaa@alvherre.pgsql to make vacuum aware of the
oldest XID in the notification queue, or develop a mechanism to
remove/freeze XIDs of the notification entries.
Regards,
[1]: /messages/by-id/202508061044.ptcyt7aqsaaa@alvherre.pgsql
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
Hi,
On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?For now, I cannot come up with a situation where it may be possible.
Perhaps, such a lagging may occur during network communication,
but I couldn't reproduce it. Maybe other people know how we can achieve
this?
Reading more the code I understand that once the a NOTIFY command is
received by a backend (and the transaction is committed) it will
emedialy signal all other listener backends and if the listener backend
is in idle it will consume the notification and then send it back to the
client as a PqMsg_NotificationResponse, so if there is a network delay
to send the notification from the listener backend back to the client I
don't think that it would be possible to get this error, because the
message was already dispatched by the backend and it will eventually get
to the client and once the notification is dispatched the backend
doesn't need to track it anymore (the queue pointers of the backend are
advanced after the dispatch).
Assuming that every SQL command is wrapped into a transaction (if it's
not already inside in) I think a busy listener backend will always
prevent the vacuum from freezing clog files past from its current xid,
so any notification that is sent while the backend is busy will not have
their transaction status removed from clog files anyway.
Is all these understandings and assumptions correct or am I missing
something here?
I think that if such a situation may be possible, the suggestion to delete
messages will no longer be relevant. Therefore, first of all, I would like to
clarify this issue.
From what I've understood until now it seems to me that this can happen
only if we have a notification on the queue without any listener, so the
notification may stay on the queue from a long time and a vacuum freeze
can be executed during this time and then when we have a new listener
(even for a different channel) it will fail to advance the pointers at
listener setup(Exec_ListenPreCommit()) because it would not be able to
get the transition status of this very old notification.
(please note that I'm not trying to invalidate your concern, I'm also
have this concern but unfortunately I'm unable to reproduce it and I'm
just sharing my thoughts to see if this issue is really possible or not)
--
Matheus Alcantara
On Tue Aug 19, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
However, I have a more fundamental concern regarding the LISTEN/NOTIFY
implementation. Since vacuum doesn't consider the XIDs of notification
entries, there might be a critical issue with the truncation of clog
entries that LISTEN/NOTIFY still requires. As I understand it,
notification queue entries aren't ordered by XID, and it's possible
for a notification with an older XID to be positioned at the queue's
head. If vacuum freeze then truncates the corresponding clogs,
listeners attempting to retrieve this notification would fail to
obtain the transaction status. To address this, we likely need to
either implement Álvaro's suggestion[1] to make vacuum aware of the
oldest XID in the notification queue, or develop a mechanism to
remove/freeze XIDs of the notification entries.
Thanks for the comments! Please see my reply at [1]/messages/by-id/DC7KGTXW3QSG.OZA24HONT78J@gmail.com that I mention that
I don't think that is too easy to have this specific scenario of a busy
backend loose dropped notifications.
[1]: /messages/by-id/DC7KGTXW3QSG.OZA24HONT78J@gmail.com
--
Matheus Alcantara
On Wed, Aug 20, 2025 at 2:18 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
Hi,
On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?For now, I cannot come up with a situation where it may be possible.
Perhaps, such a lagging may occur during network communication,
but I couldn't reproduce it. Maybe other people know how we can achieve
this?Reading more the code I understand that once the a NOTIFY command is
received by a backend (and the transaction is committed) it will
emedialy signal all other listener backends and if the listener backend
is in idle it will consume the notification and then send it back to the
client as a PqMsg_NotificationResponse, so if there is a network delay
to send the notification from the listener backend back to the client I
don't think that it would be possible to get this error, because the
message was already dispatched by the backend and it will eventually get
to the client and once the notification is dispatched the backend
doesn't need to track it anymore (the queue pointers of the backend are
advanced after the dispatch).Assuming that every SQL command is wrapped into a transaction (if it's
not already inside in) I think a busy listener backend will always
prevent the vacuum from freezing clog files past from its current xid,
so any notification that is sent while the backend is busy will not have
their transaction status removed from clog files anyway.
What about backend processes that don't have any xid or xmin (i.e.,
are read-only query and in idle-in-transaction)?
IIUC we process the notification entries at the beginning of the
server loop (see L4608 in postgres.c) and when reading a command (via
ProcessClientReadInterrupt()), but it seems to me that if a process is
in idle-in-transaction state it doesn't process the entries unless the
transaction is committed. I've reproduced the missing clog entry error
even if we have a notification on the queue with a valid listener,
with the following steps:
1. Initialize the database cluster.
2. Execute "ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;".
3. Start one psql session and execute:
-- Session 1
=# listen s;
LISTEN
=# begin;
BEGIN
(keep this session open to leave the process idle-in-transaction.)
4. Open another psql session and execute:
-- Session 2
=# begin;
BEGIN
=# select txid_current();
txid_current
--------------
756
(1 row)
=# notify s;
NOTIFY
=# commit;
COMMIT
The notification to the channel 's' should be available for the
session-1's transaction.
5. Consume enough XIDs to truncate clog entries.
-- Session 2
=# create extension xid_wraparound;
CREATE EXTENSION
=# select consume_xids(10_000_000);
NOTICE: consumed 10000000 / 10000000 XIDs, latest 0:10000757
consume_xids
--------------
10000757
(1 row)
=# select txid_current();
txid_current
--------------
10000758
(1 row)
6. Execute vacuum freeze on all databases:
$ vacuumdb --all --freeze
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template0"
vacuumdb: vacuuming database "template1"
$ psql -d postgres -c "select datname, datfrozenxid, age(datfrozenxid)
from pg_database"
datname | datfrozenxid | age
-----------+--------------+-----
postgres | 10000759 | 11
template0 | 10000759 | 11
template1 | 10000759 | 11
(3 rows)
7. On the first psql session:
-- Session 1
=# commit;
COMMIT
ERROR: could not access status of transaction 756
DETAIL: Could not open file "pg_xact/0000": No such file or directory.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu Aug 21, 2025 at 7:25 PM -03, Masahiko Sawada wrote:
What about backend processes that don't have any xid or xmin (i.e.,
are read-only query and in idle-in-transaction)?IIUC we process the notification entries at the beginning of the
server loop (see L4608 in postgres.c) and when reading a command (via
ProcessClientReadInterrupt()), but it seems to me that if a process is
in idle-in-transaction state it doesn't process the entries unless the
transaction is committed. I've reproduced the missing clog entry error
even if we have a notification on the queue with a valid listener,
with the following steps:
Ok, now we can reproduce this, thank you! So I think that we need a way
to teach the VACUUM FREEZE about old xid's on async queue.
I'll work on this considering the initial Daniil comments at [1]/messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
[1]: /messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
--
Matheus Alcantara
On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
I'll work on this considering the initial Daniil comments at [1]
[1] /messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
I've been working on this on the last few days, please see the attached
patch version.
In this new version I tried to follow the suggestion from Daniil of
scanning all pages from tail to head of the async queue.
I tried to use other data structures like a list or a hashmap to store
the xids but it shows to me more complicated due to duplicated
information being stored and also considering that this data structure
should be stored on shared memory, so I just tried to get the oldest xid
with the information that we already have on the async queue.
To find the oldest xid on the async queue I basically read all
AsyncQueueEntry's on async queue pages on SLRU cache from tail to head,
skipping only uncommitted transactions and notifications for different
databases.
The code that reads the pages and the entries within the page is a bit
coupled with the send notification logic. The important functions that
execute this is asyncQueueReadAllNotifications() which read the pages
from SLRU cache and then call the asyncQueueProcessPageEntries() which
read the entries within the page and send the notification to the client
if needed. Since I need a very similar code to read the notification
entries on the queue to check the min xid I think that it would be good
to have an API that just works like an iterator and returns the next
queue entry to work on so that it can be used for both cases.
In this patch I created an AsyncQueueIterator that iterates over pages
and AsyncQueueEntry's within the pages. The code is based on the
asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
functions. For now I'm just implementing the iterator logic and use at
AsyncQueueMinXid() to get the min xid on the queue to make the review
more easier but I think that we can have another patch that refactor
asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
functions to use this iterator approach and avoid duplicated code.
I'm also adding a TAP test that reproduces the issue, but I'm not sure if
it's declared on the best path.
I think that it would be good to perform some benchmark tests to see if
we have performance issues when reading very long async queues.
Thoughts?
--
Matheus Alcantara
Attachments:
v1-0001-Consider-LISTEN-NOTIFY-min-xid-during-VACUUM-FREE.patchapplication/octet-stream; name=v1-0001-Consider-LISTEN-NOTIFY-min-xid-during-VACUUM-FREE.patchDownload+352-1
I came across this discussion after we had this problem at least twice
now at work.
I read through the discussion to see if this is the problem we are
experiencing.
At first, I thought this was not exactly our problem, because of the way
it was showed to be reproduced.
With a long running transaction that is listening, and then doing all
the other steps like like running through transactions until a second
pg_xact file is created, and then vacuuming etc etc...
But, I felt like we don't do this, so it cannot be the way we run into
the problem. Specifically, having a listener as a long running transaction.
So I set out to get it into the broken state, and here is how I can
reproduce it without long running queries:
1. Create another database
postgres=# CREATE DATABASE test;
CREATE DATABASE
2. Create a listener on the postgres database.
postgres=# LISTEN xx;
LISTEN
3. Create notifies for the test database.
Here I struggled to get a stuck notification in the queue (SELECT
pg_notification_queue_usage();).
It can happen with only a single notify from a psql connection, but I
get a higher hit rate with the following:
Create a notify.sql with the only contents being:
```
NOTIFY s, 'yappers';
```
Then run this against the test database with pgbench.
pgbench -n -c 80 -j 20 -t 1000 -f notify.sql test
4. Confirm that there is now something stuck in the queue:
postgres=# SELECT pg_notification_queue_usage();
pg_notification_queue_usage
-----------------------------
9.5367431640625e-07
(1 row)
If this still shows 0, then run step 3 again.
5. Consume xid's. I create a file consume.sql with the only contents being:
```
SELECT txid_current();
```
Then:
pgbench -n -c 80 -j 30 -t 100000 -f consume.sql postgres
6. Verify that a new file is created in the pg_xact folder.
If not, just run the previous step again.
7. Run vacuum freeze. (Remember to allow connections to template0
beforehand with `ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;`)
$ vacuumdb --all --freeze
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template0"
vacuumdb: vacuuming database "template1"
vacuumdb: vacuuming database "test"
8. Connect to the test database and try execute a listener.
test=# LISTEN anything;
ERROR: could not access status of transaction 81086
DETAIL: Could not open file "pg_xact/0000": No such file or directory.
test=#
Extra weirdness:
In step 2 create a connection to the test database and LISTEN there on
any channel, even the one you are notifying to:
And you don't have to run anything on that connection, there is no
backend_xmin on that connection, you don't start a transaction nothing,
you just run `LISTEN something;`
Then after step 7, as long as you don't close this connection, you will
not get an error when you try to set up a listener on the test database,
even on a new connection to the test database.
What I'm saying is after step 7 you can have two connections. One to
postgres database with active listener, and one to test database with
active listener.
As long as the postgres one is open the queue is stuck, ie `SELECT
pg_notification_queue_usage();` returns non-zero.
As long as the test database connection is open, new listeners do not
experience the error.
If you close the test database connection, but leave the postgres
connection, from now on any listener you try to create on test database
will error.
If you close the postgres connection, the queue clears immediately and
new LISTEN's on the test database will work.
This means it is possible to get rid of the problem without restarting
the postmaster if you can close connections until you close the one that
"caused" the problem.
Don't know if this is actually useful, but to me it seemed like everyone
believed it to be impossible to recover without restarting, so it's
something at least.
TLDR:
active listener on one database causes notify on another database to get
stuck.
At no point could I get a stuck notify if I don't have a listener on at
least one other database than the one I am notifying on. See the Extra
weirdness section.
At no point do you need to have any other queries running, there is
never an idle in transaction query needed for bad timing with the vacuum.
I hope I explained everything well enough so that one of you smart
people can find and fix the problem.
29.08.2025 01:31, Matheus Alcantara пишет:
On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
I'll work on this considering the initial Daniil comments at [1]
[1] /messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
I've been working on this on the last few days, please see the attached
patch version.In this new version I tried to follow the suggestion from Daniil of
scanning all pages from tail to head of the async queue.
Since we still need to scan those pages, couldn't we mark finished
transactions as committed/aborted?
This way we may to not hold datfrozenxid for a long time and will allow
both safe clog truncation and safe async queue notification.
More over, most notifications could be marked as completed in
AtCommit_Notify already. So there is a need to recheck a few notifications
that were written but not marked in AtCommit_Notify.
Since AsyncQueueEntry field is used only to check visibility, looks like it
is safe to set it to FrozenTransactionId.
--
regards
Yura Sokolov aka funny-falcon
Hi,
On Tue, Sep 2, 2025 at 10:38 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
29.08.2025 01:31, Matheus Alcantara пишет:
On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
I'll work on this considering the initial Daniil comments at [1]
[1] /messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
I've been working on this on the last few days, please see the attached
patch version.In this new version I tried to follow the suggestion from Daniil of
scanning all pages from tail to head of the async queue.Since we still need to scan those pages, couldn't we mark finished
transactions as committed/aborted?
This way we may to not hold datfrozenxid for a long time and will allow
both safe clog truncation and safe async queue notification.
More over, most notifications could be marked as completed in
AtCommit_Notify already. So there is a need to recheck a few notifications
that were written but not marked in AtCommit_Notify.Since AsyncQueueEntry field is used only to check visibility, looks like it
is safe to set it to FrozenTransactionId.
I like the idea to make queue records more self-sufficient, so we
don't need to check a transaction's status to determine if a
notification should be sent. What if we somehow mark every record of
an aborted transaction, so listeners can skip it without checking
transaction status? Also if the record is not marked as "aborted" we
also can send it without checking transaction status. IIUC
AtAbort_Notify is called before any listener can read records of the
aborted transaction (is it correct?), so we can do such "marking" in
AtAbort_Notify. And the listen/notify subsystem already has a similar
"marking" mechanism. At the end of every queue page we have a "dummy"
record which is skipped by every listener. Listeners skip it because
record's 'dboid' equals InvalidOid (listeners ignore any record where
the 'dboid' is different from their own). In the same way aborted
transactions can set 'dboid' to InvalidOid for their records in
AtAbort_Notify.
Some benefits of this approach:
1) Only aborted transactions would need to perform extra work.
2) Listeners don't need to check notify transaction status, but ofc
they still need to wait if the notify transaction is 'in progress'.
3) Vacuum don't need to be listen/notify aware.
4) We don't need to do an extra scan of the whole queue.
What do you think?
Best regards,
Arseniy Mukhin
On Mon Sep 1, 2025 at 11:06 AM -03, Jacques Combrink wrote:
TLDR:
active listener on one database causes notify on another database to get
stuck.
At no point could I get a stuck notify if I don't have a listener on at
least one other database than the one I am notifying on. See the Extra
weirdness section.
At no point do you need to have any other queries running, there is
never an idle in transaction query needed for bad timing with the vacuum.I hope I explained everything well enough so that one of you smart
people can find and fix the problem.
The long running transaction steps is just an example that we can lose
notifications using the first patch from Daniil that Alex has shared on
[1]: /messages/by-id/CAK98qZ3wZLE-RZJN_Y+TFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
issue but it's similar to the steps that Alex also shared on [1]/messages/by-id/CAK98qZ3wZLE-RZJN_Y+TFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com.
All these different ways to trigger the error face the same underlying
problem: If a notification is keep for too long on the queue that vacuum
freeze can run and truncate clog files that contains transaction
information of this notification the error will happen.
The patch that I've attached on [2]https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com aims to fix the issue following the
steps that you've shared, but during the tests I've found a stack
overflow bug on AsyncQueueIterNextNotification() due to the number of
notifications. I'm attaching a new version that fix this bug and I tried
to reproduce your steps with this new version and the issue seems to be
fixed.
Note that notifications that were added without any previous LISTEN will
block the xid advance during VACUUM FREEZE until we have a listener on
the database that owns these notifications. The XXX comment on vacuum.c
is about this problem.
[1]: /messages/by-id/CAK98qZ3wZLE-RZJN_Y+TFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
[2]: https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com
--
Matheus Alcantara
Attachments:
v2-0001-Consider-LISTEN-NOTIFY-min-xid-during-VACUUM-FREE.patchtext/plain; charset=utf-8; name=v2-0001-Consider-LISTEN-NOTIFY-min-xid-during-VACUUM-FREE.patchDownload+383-1
Thanks for the comments!
On Tue Sep 2, 2025 at 4:37 AM -03, Yura Sokolov wrote:
29.08.2025 01:31, Matheus Alcantara пишет:
On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
I'll work on this considering the initial Daniil comments at [1]
[1] /messages/by-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y+CDfEc9M2TXiOf_4kfBMg@mail.gmail.com
I've been working on this on the last few days, please see the attached
patch version.In this new version I tried to follow the suggestion from Daniil of
scanning all pages from tail to head of the async queue.Since we still need to scan those pages, couldn't we mark finished
transactions as committed/aborted?
This way we may to not hold datfrozenxid for a long time and will allow
both safe clog truncation and safe async queue notification.
More over, most notifications could be marked as completed in
AtCommit_Notify already. So there is a need to recheck a few notifications
that were written but not marked in AtCommit_Notify.Since AsyncQueueEntry field is used only to check visibility, looks like it
is safe to set it to FrozenTransactionId.
Maybe we could have a boolean "committed" field on AsyncQueueEntry and
check this field before sending the notification instead of call
TransactionIdDidCommit()? Is something similar what you are proposing?
We create the AsyncQueueEntry's and add into the SLRU at
PreCommit_Notify(), so to mark these entries as committed on
AtCommit_Notify() we would need a way to find these entries on the SLRU
from the pendingNotifies->events that contains the notifications added
on the current transaction being committed.
This idea looks promising to me TBH, I'm just not sure if it's
completely safe to mark an AsyncQueueEntry on the queue as committed
without checking on TransactionIdDidCommit(). I would like to read more
opinions about this before working on the next version based on this
idea.
--
Matheus Alcantara
On Wed Sep 3, 2025 at 4:28 PM -03, Arseniy Mukhin wrote:
Since we still need to scan those pages, couldn't we mark finished
transactions as committed/aborted?
This way we may to not hold datfrozenxid for a long time and will allow
both safe clog truncation and safe async queue notification.
More over, most notifications could be marked as completed in
AtCommit_Notify already. So there is a need to recheck a few notifications
that were written but not marked in AtCommit_Notify.Since AsyncQueueEntry field is used only to check visibility, looks like it
is safe to set it to FrozenTransactionId.I like the idea to make queue records more self-sufficient, so we
don't need to check a transaction's status to determine if a
notification should be sent. What if we somehow mark every record of
an aborted transaction, so listeners can skip it without checking
transaction status? Also if the record is not marked as "aborted" we
also can send it without checking transaction status. IIUC
AtAbort_Notify is called before any listener can read records of the
aborted transaction (is it correct?), so we can do such "marking" in
AtAbort_Notify. And the listen/notify subsystem already has a similar
"marking" mechanism. At the end of every queue page we have a "dummy"
record which is skipped by every listener. Listeners skip it because
record's 'dboid' equals InvalidOid (listeners ignore any record where
the 'dboid' is different from their own). In the same way aborted
transactions can set 'dboid' to InvalidOid for their records in
AtAbort_Notify.Some benefits of this approach:
1) Only aborted transactions would need to perform extra work.
2) Listeners don't need to check notify transaction status, but ofc
they still need to wait if the notify transaction is 'in progress'.
3) Vacuum don't need to be listen/notify aware.
4) We don't need to do an extra scan of the whole queue.What do you think?
IIUC we don't store notifications of aborted transactions on the queue.
On PreCommit_Notify we add the notifications on the queue and on
Commit_Notify() we signal the backends.
Or I'm missing something here?
--
Matheus Alcantara