nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

Started by Michael Paquieralmost 7 years ago5 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,
(Adding in CC relevant committer and author, Teodor and Alexander)

I have been looking today at a crash of pg_waldump on one of the test
builds keeping running in some internal environment. Luckily, I have
been able to put my hands on a core file with 11.2 running. The
backtrace is not that interesting:
(gdb) bt
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
#1 0x000000000040419f in XLogDumpDisplayRecord (config=0x7fff1ccfd360,
record=0x22ce590) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:558
#2 main (argc=<optimized out>, argv=<optimized out>) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:1170
(gdb) down 1
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
103 nbtdesc.c: No such file or directory.
(gdb) p record
$1 = (XLogReaderState *) 0x22ce590
(gdb) p *record
$2 = {wal_segment_size = 16777216, read_page = 0x405220
<XLogDumpReadPage>, system_identifier = 0, private_data =
0x7fff1ccfd380, ReadRecPtr = 67109592, EndRecPtr = 67109672,
decoded_record = 0x22cf178, main_data = 0x0, main_data_len = 0,
main_data_bufsz = 0, record_origin = 0, blocks = {{in_use = true,
rnode = {spcNode = 16399, dbNode = 16386, relNode = 19907}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 96 '`', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0, hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = true, data =
0x22db2c0 "\003", data_len = 32, data_bufsz = 8192}, {in_use = false,
rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 0 '\000', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0, hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = false, data = 0x0,
data_len = 0, data_bufsz = 0} <repeats 32 times>}, max_block_id = 0,
readBuf = 0x22ceea0 "\230\320\a", readLen = 8192, readSegNo = 4,
readOff = 0, readPageTLI = 0, latestPagePtr = 67108864, latestPageTLI
= 1, currRecPtr = 67109592, currTLI = 0, currTLIValidUntil = 0,
nextTLI = 0, readRecordBuf = 0x22d12b0 "L", readRecordBufSize = 40960,
errormsg_buf = 0x22d0eb0 ""}
(gdb) p xlrec
$5 = (xl_btree_metadata *) 0x0

Anyway, after looking at the code relevant to XLOG_BTREE_META_CLEANUP,
I have noticed that the meta-data associated to the first buffer is
registered via XLogRegisterBufData() (this is correct because we want
to associate this data to the metadata buffer). However, nbtdesc.c
assumes that xl_btree_metadata is from the main record data, causing a
crash because we have nothing in this case.

I think that we could just patch nbtpage.c so as we fetch the
metadata using XLogRecGetBlockData(), and log its data. The error
comes from 3d92796, which is one year-old and has been visibly
committed untested. I am surprised that we have not seen that
complain yet. Attached is a patch, which looks right to me and should
be back-patched down to v11. I have not taken the time to actually
test it though.

Thoughts?
--
Michael

Attachments:

nbtdesc-meta-cleanup-v1.patchtext/x-diff; charset=us-asciiDownload+3-1
In reply to: Michael Paquier (#1)
Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:

I think that we could just patch nbtpage.c so as we fetch the
metadata using XLogRecGetBlockData(), and log its data.

Don't you mean nbtdesc.c?

The error
comes from 3d92796, which is one year-old and has been visibly
committed untested. I am surprised that we have not seen that
complain yet.

Why is that surprising?

https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

--
Peter Geoghegan

#3Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#2)
Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

On Sun, Jun 16, 2019 at 06:54:57PM -0700, Peter Geoghegan wrote:

On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:

I think that we could just patch nbtpage.c so as we fetch the
metadata using XLogRecGetBlockData(), and log its data.

Don't you mean nbtdesc.c?

Yeah I meant nbtdesc.c, sorry. This will have to wait after this
week's release for a fix by the way...

Why is that surprising?

https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

I would have supposed that more people scan WAL records using the
description callbacks.
--
Michael

In reply to: Michael Paquier (#3)
Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

On Sun, Jun 16, 2019 at 7:05 PM Michael Paquier <michael@paquier.xyz> wrote:

I would have supposed that more people scan WAL records using the
description callbacks.

The WAL record in question, XLOG_BTREE_META_CLEANUP, is certainly one
of the less common record types used by nbtree. I agree that this
should have been tested when it went in, but I'm not surprised that
the bug remained undetected for a year. Not that many people use
pg_waldump.

--
Peter Geoghegan

#5Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#4)
Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

On Sun, Jun 16, 2019 at 07:14:05PM -0700, Peter Geoghegan wrote:

The WAL record in question, XLOG_BTREE_META_CLEANUP, is certainly one
of the less common record types used by nbtree. I agree that this
should have been tested when it went in, but I'm not surprised that
the bug remained undetected for a year. Not that many people use
pg_waldump.

Actually, a simple installcheck generates a handful of them. I have
not actually run into a crash, but this causes pg_waldump to describe
the record incorrectly. Committed down to 11 after cross-checking
that the data inserted in the WAL record and what gets described are
both consistent.

_bt_restore_meta() does the right thing by the way when restoring the
page.
--
Michael