B-tree crash recovery error in 8.3 beta 2
Hi,
I've found that B-tree crash recovery in 8.3 beta2 could make some
tuples invisible through B-tree. They're visible if we read using but
Seq-Scan. This happens in 8.3 beta2, but not in 8.2.4. Here's how it
happens.
1. Create b-tree for a text type column.
2. Make B-tree three-story, that is, root-intermediate-leaf. Insert
tuples sufficient to construct such B-tree.
3. No checkpoint should occur during 2.
4. Kill postmaster.
5. Restart postmaster. Crash recovery will be done.
6. Tuples with column values less than HIKEY becomes invisible through
Idx-scan, still visible through Seq-scan.
From the dump of B-tree, it seems that HIKEY value is cleared (only
tuple header is left). No problem was found in the case of integer or
numeric type columns.
Attached is the shell script, postgresql.conf (almost the default one)
to reproduce the problem, and the log of the problem reproduction.
--
Koichi Suzuki
Koichi Suzuki wrote:
I've found that B-tree crash recovery in 8.3 beta2 could make some
tuples invisible through B-tree. They're visible if we read using but
Seq-Scan. This happens in 8.3 beta2, but not in 8.2.4. Here's how it
happens.1. Create b-tree for a text type column.
2. Make B-tree three-story, that is, root-intermediate-leaf. Insert
tuples sufficient to construct such B-tree.
3. No checkpoint should occur during 2.
4. Kill postmaster.
5. Restart postmaster. Crash recovery will be done.
6. Tuples with column values less than HIKEY becomes invisible through
Idx-scan, still visible through Seq-scan.From the dump of B-tree, it seems that HIKEY value is cleared (only
tuple header is left). No problem was found in the case of integer or
numeric type columns.Attached is the shell script, postgresql.conf (almost the default one)
to reproduce the problem, and the log of the problem reproduction.
Thanks for the excellent reproducer script!
There seems to be a bug in the B-tree split WAL reduction patch from
Februrary. On split, we copy the HIKEY of the left page from the
leftmost item on the right page, but that doesn't work because the
leftmost key is not stored on intermediate levels.
Patch attached that stores the high key explicitly in the WAL record on
intermediate levels.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
btree-split-fix-1.patchtext/x-diff; name=btree-split-fix-1.patchDownload+43-12
Heikki Linnakangas <heikki@enterprisedb.com> writes:
There seems to be a bug in the B-tree split WAL reduction patch from
Februrary. On split, we copy the HIKEY of the left page from the
leftmost item on the right page, but that doesn't work because the
leftmost key is not stored on intermediate levels.
Patch attached that stores the high key explicitly in the WAL record on
intermediate levels.
Applied with revisions --- mostly, being safe about alignment issues.
You wouldn't notice that on an Intel CPU, but some machines are
pickier ...
regards, tom lane