autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM)
Hi hackers,
I was stress-testing master (commit e2b35735b00, assertions enabled) with a
workload that does a lot of DDL/DML, including creating and dropping
databases in a tight loop, and the autovacuum launcher kept crashing on me
--
every 15-40 minutes or so once it was under load:
TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
io_context, io_op)"), File: "pgstat_io.c", Line: 74
LOG: autovacuum launcher process (PID ...) was terminated by signal 6:
Aborted
The postmaster recovers fine, but it just starts another launcher that hits
the exact same assert, so it never really gets out of the loop.
The short version: the launcher is in get_database_list(), doing its seqscan
of pg_database, and on-access pruning kicks in during the scan. Since
b46e1e54d07 ("Allow on-access pruning to set pages all-visible"),
heap_page_prune_opt() pins the visibility map unconditionally once it
decides
to prune -- before it ever checks rel_read_only. visibilitymap_pin() isn't
read-only though: if the VM page isn't there yet it extends the fork, and
pg_database has no VM fork, so we end up doing an actual relation extend
(IOOP_EXTEND) from the launcher. pgstat_tracks_io_op() says the launcher
must never do an EXTEND, hence the assertion.
What surprised me is that the launcher's catalog scan isn't even flagged
read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
so it never actually intends to set the VM -- it just pins/extends it
anyway.
Here are the relevant frames:
#3 ExceptionalCondition ("pgstat_tracks_io_op(...)", "pgstat_io.c", 74)
at assert.c:65
#4 pgstat_count_io_op (io_object=IOOBJECT_RELATION,
io_context=IOCONTEXT_NORMAL, io_op=IOOP_EXTEND, cnt=1, bytes=8192)
at pgstat_io.c:74
#5 pgstat_count_io_op_time (...) at pgstat_io.c:160
at bufmgr.c:3030
#7 ExtendBufferedRelCommon (... fork=VISIBILITYMAP_FORKNUM ...)
at bufmgr.c:2774
#8 ExtendBufferedRelTo (... fork=VISIBILITYMAP_FORKNUM, extend_to=1 ...)
at bufmgr.c:1099
#9 vm_extend (vm_nblocks=1, ...) at visibilitymap.c:614
#10 vm_readbuf (blkno=0, extend=true) at visibilitymap.c:572
#11 visibilitymap_pin (...) at visibilitymap.c:216
#12 heap_page_prune_opt (..., rel_read_only=...) at pruneheap.c:339
#13 heap_prepare_pagescan (...) at heapam.c:638
#14 heapgettup_pagemode (... ForwardScanDirection ...) at heapam.c:1113
#15 heap_getnext (...) at heapam.c:1454
#16 get_database_list () at autovacuum.c:1856
#17 do_start_worker () at autovacuum.c:1172
#19 launch_worker (...) at autovacuum.c:1355
#20 AutoVacLauncherMain (...) at autovacuum.c:780
#21 postmaster_child_launch (child_type=B_AUTOVAC_LAUNCHER, ...)
at launch_backend.c:268
#22 StartChildProcess (type=B_AUTOVAC_LAUNCHER) at postmaster.c:4030
#23 LaunchMissingBackgroundProcesses () at postmaster.c:3375
#24 ServerLoop () at postmaster.c:1743
#25 PostmasterMain (...) at postmaster.c:1415
#26 main (...) at main.c:231
I haven't been able to boil this down to a clean standalone repro yet -- it
seems to need the launcher to hit get_database_list() at the moment a
pg_database page is prunable and the VM fork still has to grow -- but the
path
looks pretty clear from the stack.
Regards,
Ewan
On Sun, May 31, 2026 at 12:36:45PM +0800, Ewan Young wrote:
I haven't been able to boil this down to a clean standalone repro yet -- it
seems to need the launcher to hit get_database_list() at the moment a
pg_database page is prunable and the VM fork still has to grow -- but the
path looks pretty clear from the stack.
My first suspicion would be something in the area of b46e1e54d078 and
some of the VM improvements. Melanie?
--
Michael
That was exactly the right neighborhood, thanks for the quick pointer.
The mechanism, as far as I can tell (I'm new to this code, so corrections
welcome):
The autovacuum launcher scans pg_database in get_database_list() with a
catalog scan (rel_read_only = false). On a full, prunable page,
heap_page_prune_opt() calls visibilitymap_pin(), which extends the VM fork
if it doesn't exist. The launcher isn't allowed to do IOOP_EXTEND, so
pgstat_count_io_op() trips the assertion at pgstat_io.c:74 and it aborts.
Two commits combine to cause it: 4f7ecca84dd added the unconditional
(extending) visibilitymap_pin() in the on-access prune path, and 378a216187a
made INSERT set pd_prune_xid, so on-access pruning now fires on
insert-mostly
catalogs like pg_database. That's also why it was hard to reduce: any
regular backend or autovacuum worker that scans pg_database recreates the
fork harmlessly (they may extend), so the launcher only crashes in the brief
window before that happens. I can now reproduce it deterministically; happy
to share the script.
Patch attached (one file). Only read-only scans actually set the VM, and
those run only in regular backends (which may extend), so it extends the
fork only on read-only scans and, for other scans, pins an existing VM page
without extending (via visibilitymap_get_status()). Corruption detection is
unaffected; we just leave fork creation to the next VACUUM. Passes
check/installcheck, isolation, and contrib/pg_visibility.
This is just my own attempt at a fix and I'm not sure it's correct, so
please don't hesitate to point out anything I've got wrong.
Thanks,
Ewan
On Mon, Jun 1, 2026 at 12:08 PM Michael Paquier <michael@paquier.xyz> wrote:
Show quoted text
On Sun, May 31, 2026 at 12:36:45PM +0800, Ewan Young wrote:
I haven't been able to boil this down to a clean standalone repro yet --
it
seems to need the launcher to hit get_database_list() at the moment a
pg_database page is prunable and the VM fork still has to grow -- but the
path looks pretty clear from the stack.My first suspicion would be something in the area of b46e1e54d078 and
some of the VM improvements. Melanie?
--
Michael
Attachments:
v1-0001-Don-t-extend-the-visibility-map-fork-during-non-r.patchapplication/octet-stream; name=v1-0001-Don-t-extend-the-visibility-map-fork-during-non-r.patchDownload+29-3
On Sun, May 31, 2026 at 12:37 AM Ewan Young <kdbase.hack@gmail.com> wrote:
I was stress-testing master (commit e2b35735b00, assertions enabled) with a
workload that does a lot of DDL/DML, including creating and dropping
databases in a tight loop, and the autovacuum launcher kept crashing on me --
every 15-40 minutes or so once it was under load:TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
io_context, io_op)"), File: "pgstat_io.c", Line: 74
LOG: autovacuum launcher process (PID ...) was terminated by signal 6:
AbortedThe postmaster recovers fine, but it just starts another launcher that hits
the exact same assert, so it never really gets out of the loop.
Ouch :( Thanks for the report!
What surprised me is that the launcher's catalog scan isn't even flagged
read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
so it never actually intends to set the VM -- it just pins/extends it anyway.
Yea, so SO_HINT_REL_READ_ONLY is only meant as a hint. I don't
guarantee that all queries that aren't modifying the relation will
pass it. It's only a performance optimization. We did touch on
excluding scans of catalog tables briefly in the thread (albeit deep
in the thread) [1]/messages/by-id/9468F957-C0ED-4D72-8C89-61162CAA5591@yandex-team.ru.
That being said, we still pin the VM (and potentially extend it) even
if we don't set it. If it does already exist, pinning it lets us check
for corruption. If we extend it and won't set it, it isn't totally
wasted work because then we don't have to extend it later. Though it's
true that if the VM needs to be extended, that part of the VM can't be
corrupted, I wanted to avoid special casing the presence of the VM
page.
When I wrote pgstat_tracks_io_op(), I thought through which IO
operations we should expect for each backend type. The idea was that
if that were ever to change, we could change pgstat_tracks_io_op().
There is no reason why the autovacuum launcher inherently shouldn't
extend the VM. Logically, if it is just reading catalog tables, it
won't need to extend the actual data fork of the relation. However,
now that reading tables may cause extending the VM, we can modify
pgstat_tracks_io_op() like this:
- if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
- bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND)
+ if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) &&
+ io_op == IOOP_EXTEND)
return false;
We should probably add a flags argument to table_beginscan_catalog()
in 20. That along with modifying pgstat_tracks_io_op() is probably the
right solution IMO. I think we shouldn't do that in 19 since it is
expanding the feature footprint.
So for fixing 19, we could just alter pgstat_tracks_io_op() which
would avoid tripping the assert.
The next question is whether or not we should also do the patch you
proposed, since it is true that if the VM is not extended enough to
have the relevant VM bits, we can't possibly be fixing corruption. And
if we are not doing that and won't set the VM, we are prematurely
doing work.
My hesitation about this is that the logic is a bit confusing. It
relies on us knowing that visibiltymap_pin() will extend the relation
and visibilitymap_get_status() won't. Both will read and pin the VM if
it exists. If in heap_page_prune_and_freeze() we do anything with the
VM page, it must be pinned. So, you have to know that if it needed to
be extended, we won't have done that if rel_read_only was passed and
so vmbuffer will be invalid and visiblitymap_get_status() will have
returned 0. I don't think it's wrong, it just feels a bit off in a way
I can't quite put my finger on. Perhaps it is just that I liked the
invariant that the VM page would always be passed to
heap_page_prune_and_freeze(). I'll have to think about it a bit more.
On Mon, Jun 1, 2026 at 4:41 AM Ewan Young <kdbase.hack@gmail.com> wrote:
Two commits combine to cause it: 4f7ecca84dd added the unconditional
(extending) visibilitymap_pin() in the on-access prune path, and 378a216187a
made INSERT set pd_prune_xid, so on-access pruning now fires on insert-mostly
catalogs like pg_database. That's also why it was hard to reduce: any
regular backend or autovacuum worker that scans pg_database recreates the
fork harmlessly (they may extend), so the launcher only crashes in the brief
window before that happens. I can now reproduce it deterministically; happy
to share the script.
Please do share your reproducer. It's always nice to have those.
- Melanie
[1]: /messages/by-id/9468F957-C0ED-4D72-8C89-61162CAA5591@yandex-team.ru
Hi Melanie,
That all makes sense — your plan sounds right to me. Happy to go with
relaxing pgstat_tracks_io_op() for 19 and leaving the
table_beginscan_catalog() flags work for 20, and to drop my patch.
Reproducer attached.
Thanks so much for the quick and thoughtful response — really appreciate it!
Ewan
On Mon, Jun 1, 2026 at 11:05 PM Melanie Plageman <melanieplageman@gmail.com>
wrote:
Show quoted text
On Sun, May 31, 2026 at 12:37 AM Ewan Young <kdbase.hack@gmail.com> wrote:
I was stress-testing master (commit e2b35735b00, assertions enabled)
with a
workload that does a lot of DDL/DML, including creating and dropping
databases in a tight loop, and the autovacuum launcher kept crashing onme --
every 15-40 minutes or so once it was under load:
TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
io_context, io_op)"), File: "pgstat_io.c", Line: 74
LOG: autovacuum launcher process (PID ...) was terminated by signal 6:
AbortedThe postmaster recovers fine, but it just starts another launcher that
hits
the exact same assert, so it never really gets out of the loop.
Ouch :( Thanks for the report!
What surprised me is that the launcher's catalog scan isn't even flagged
read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
so it never actually intends to set the VM -- it just pins/extends itanyway.
Yea, so SO_HINT_REL_READ_ONLY is only meant as a hint. I don't
guarantee that all queries that aren't modifying the relation will
pass it. It's only a performance optimization. We did touch on
excluding scans of catalog tables briefly in the thread (albeit deep
in the thread) [1].That being said, we still pin the VM (and potentially extend it) even
if we don't set it. If it does already exist, pinning it lets us check
for corruption. If we extend it and won't set it, it isn't totally
wasted work because then we don't have to extend it later. Though it's
true that if the VM needs to be extended, that part of the VM can't be
corrupted, I wanted to avoid special casing the presence of the VM
page.When I wrote pgstat_tracks_io_op(), I thought through which IO
operations we should expect for each backend type. The idea was that
if that were ever to change, we could change pgstat_tracks_io_op().
There is no reason why the autovacuum launcher inherently shouldn't
extend the VM. Logically, if it is just reading catalog tables, it
won't need to extend the actual data fork of the relation. However,
now that reading tables may cause extending the VM, we can modify
pgstat_tracks_io_op() like this:- if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER || - bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND) + if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) && + io_op == IOOP_EXTEND) return false;We should probably add a flags argument to table_beginscan_catalog()
in 20. That along with modifying pgstat_tracks_io_op() is probably the
right solution IMO. I think we shouldn't do that in 19 since it is
expanding the feature footprint.So for fixing 19, we could just alter pgstat_tracks_io_op() which
would avoid tripping the assert.The next question is whether or not we should also do the patch you
proposed, since it is true that if the VM is not extended enough to
have the relevant VM bits, we can't possibly be fixing corruption. And
if we are not doing that and won't set the VM, we are prematurely
doing work.My hesitation about this is that the logic is a bit confusing. It
relies on us knowing that visibiltymap_pin() will extend the relation
and visibilitymap_get_status() won't. Both will read and pin the VM if
it exists. If in heap_page_prune_and_freeze() we do anything with the
VM page, it must be pinned. So, you have to know that if it needed to
be extended, we won't have done that if rel_read_only was passed and
so vmbuffer will be invalid and visiblitymap_get_status() will have
returned 0. I don't think it's wrong, it just feels a bit off in a way
I can't quite put my finger on. Perhaps it is just that I liked the
invariant that the VM page would always be passed to
heap_page_prune_and_freeze(). I'll have to think about it a bit more.On Mon, Jun 1, 2026 at 4:41 AM Ewan Young <kdbase.hack@gmail.com> wrote:
Two commits combine to cause it: 4f7ecca84dd added the unconditional
(extending) visibilitymap_pin() in the on-access prune path, and378a216187a
made INSERT set pd_prune_xid, so on-access pruning now fires on
insert-mostly
catalogs like pg_database. That's also why it was hard to reduce: any
regular backend or autovacuum worker that scans pg_database recreates the
fork harmlessly (they may extend), so the launcher only crashes in thebrief
window before that happens. I can now reproduce it deterministically;
happy
to share the script.
Please do share your reproducer. It's always nice to have those.
- Melanie
[1]
/messages/by-id/9468F957-C0ED-4D72-8C89-61162CAA5591@yandex-team.ru