[BUG] Race in online checksums launcher_exit()
Hi hackers,
While using the pg_enable_data_checksums() feature, I found a likely bug, a
race condition in datachecksum_state.c's launcher_exit().
When pg_enable_data_checksums() is called twice before the first launcher
starts, two bg workers are registered (the code expects this). The
redundant launcher exits early, but it's launcher_exit() callback
unconditionally clears the shared launcher_running flag and may call
SetDataChecksumsOff() -- even though it never owned the flag.
This allows a third pg_enable_data_checksums() call to launch another
launcher concurrently with the first (duplicate work, doubled I/O, spurious
warnings). Worse, if the redundant launcher initialized after the winner
transitioned to inprogress-on, its exit handler calls
SetDataChecksumsOff(), silently aborting the enable operation. (I have
not triggered the SetDataChecksumsOff part though calling out ad it can be
a likely scenario based on timing of workers)
Reproduced by firing three calls in quick succession:
psql -c "SELECT pg_enable_data_checksums();" &
psql -c "SELECT pg_enable_data_checksums();" &
sleep 0.5
psql -c "SELECT pg_enable_data_checksums();" &
Log shows two launchers processing databases concurrently:
[2093292]: LOG: processing database "postgres"
[2093293]: LOG: already running, exiting
[2093299]: WARNING: cannot set data checksums to "on", current state is not "inprogress-on"
admitted
[2093292]: LOG: processing database "postgres"
[2093299]: WARNING: cannot set data checksums to "on", current state is not "inprogress-on"
concurrently
[2093299]: WARNING: cannot set data checksums to "on", current state is not "inprogress-on"
not "inprogress-on"
I think the process-local launcher_running flag exists for this purpose and
is already used for the worker-kill block, but the flag-clear and
state-revert blocks do not use it.
The attached patch returns early from launcher_exit() when the local flag
is false. Thoughts?
Regards,
Ayush
Attachments:
0001-Fix-race-in-online-checksums-launcher_exit.patchapplication/octet-stream; name=0001-Fix-race-in-online-checksums-launcher_exit.patchDownload+16-10
On 19 Apr 2026, at 22:09, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
Hi hackers,
While using the pg_enable_data_checksums() feature, I found a likely bug, a race condition in datachecksum_state.c's launcher_exit().
Thanks for your report. Tomas and I have worked over the past couple of days
on a fixup series due to a rare race condition which was found after extensive
longrunning testing. While hacking on that we identified what I believe is the
same bug you found and we have a fix for that, the patchset will be shared very
shortly (we am literally putting the final touches on it as I write this).
I'll compare notes and will if applicable incorporate your patch into it.
--
Daniel Gustafsson
Hi,
On Mon, 20 Apr 2026 at 01:47, Daniel Gustafsson <daniel@yesql.se> wrote:
On 19 Apr 2026, at 22:09, Ayush Tiwari <ayushtiwari.slg01@gmail.com>
wrote:
Hi hackers,
While using the pg_enable_data_checksums() feature, I found a likely
bug, a race condition in datachecksum_state.c's launcher_exit().
Thanks for your report. Tomas and I have worked over the past couple of
days
on a fixup series due to a rare race condition which was found after
extensive
longrunning testing. While hacking on that we identified what I believe
is the
same bug you found and we have a fix for that, the patchset will be shared
very
shortly (we am literally putting the final touches on it as I write this).
I'll compare notes and will if applicable incorporate your patch into it.--
Daniel Gustafsson
Thanks Daniel! Glad to hear it's being addressed. I would be happy to
test the patchset when it's posted.
Regards,
Ayush