pg_waldump vs. all-zeros WAL files; server creation of such files

Started by Noah Mischover 2 years ago2 messages
#1Noah Misch
noah@leadboat.com
1 attachment(s)

The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails
at today's master (c36b636) and v15 (1bc19df). It passes at v14 (5a32af3).
Command "pg_waldump --start 0/01000000 --end 0/01000100" fails as follows:

pg_waldump: error: WAL segment size must be a power of two between 1 MB and 1 GB, but the WAL file "000000010000000000000002" header specifies 0 bytes

Where it fails, the server has created an all-zeros WAL file under that name.
Where it succeeds, that file doesn't exist at all. Two decisions to make:

- Should a clean server shutdown ever leave an all-zeros WAL file? I think
yes, it's okay to let that happen.
- Should "pg_waldump --start $X --end $Y" open files not needed for the
requested range? I think no.

Bisect of master got:
30a53b7 Wed Mar 8 16:56:37 2023 +0100 Allow tailoring of ICU locales with custom rules
Doesn't fail at $(git merge-base REL_15_STABLE master). Bisect of v15 got:
811203d Sat Aug 6 11:50:23 2022 -0400 Fix data-corruption hazard in WAL-logged CREATE DATABASE.

I suspect those are innocent. They changed the exact WAL content, which I
expect somehow caused creation of segment 2.

Oddly, I find only one other report of this:
/messages/by-id/CAJ6DU3HiJ5FHbqPua19jAD=wLgiXBTjuHfbmv1jCOaNOpB3cCQ@mail.gmail.com

Thanks,
nm

Attachments:

010_zero.plapplication/x-perlDownload
#2Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#1)
Re: pg_waldump vs. all-zeros WAL files; server creation of such files

On Sat, Aug 12, 2023 at 08:15:31PM -0700, Noah Misch wrote:

The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails
at today's master (c36b636) and v15 (1bc19df). It passes at v14 (5a32af3).
Command "pg_waldump --start 0/01000000 --end 0/01000100" fails as follows:

pg_waldump: error: WAL segment size must be a power of two between
1 MB and 1 GB, but the WAL file "000000010000000000000002" header
specifies 0 bytes

So this depends on the ordering of the entries retrieved by readdir()
as much as the segments produced by the backend.

Where it fails, the server has created an all-zeros WAL file under that name.
Where it succeeds, that file doesn't exist at all. Two decisions to make:

- Should a clean server shutdown ever leave an all-zeros WAL file? I think
yes, it's okay to let that happen.

It doesn't hurt to leave that around. On the contrary, it makes any
follow-up startup cheaper the bigger the segment size.

- Should "pg_waldump --start $X --end $Y" open files not needed for the
requested range? I think no.

So this is a case where identify_target_directory() is called with a
fname of NULL. Agreed that search_directory could be smarter with the
files it should scan, especially if we have start and/or end LSNs at
hand to filter out what we'd like to be in the data folder.
--
Michael