pg_filedump strangeness
Hi,
I'm chasing an apparent index corruption problem, and I came across
something I can't quite explain in pg_filedump. Say I dump a non-leaf
btree index page:
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.3.0
*
* File: 31141
* Options used: -fi -x -R 6246
*
* Dump created on: Tue Apr 6 17:40:28 2010
*******************************************************************
Block 6246 ********************************************************
<Header> -----
Block Offset: 0x030cc000 Offsets: Lower 36 (0x0024)
Block: Size 8192 Version 4 Upper 8120 (0x1fb8)
LSN: logid 1077 recoff 0x45c8b660 Special 8176 (0x1ff0)
Items: 3 Free Space: 8084
TLI: 0x0001 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 36
0000: 35040000 60b6c845 01000000 2400b81f 5...`..E....$...
0010: f01f0420 00000000 d89f3000 d09f1000 ... ......0.....
0020: b89f3000 ..0.
<Data> ------
Item 1 -- Length: 24 Offset: 8152 (0x1fd8) Flags: NORMAL
Block Id: 6232 linp Index: 1 Size: 24
Has Nulls: 32768 Has Varwidths: 0
1fd8: 00005818 01001880 01000000 00000000 ..X.............
1fe8: 80bcc57d 74230100 ...}t#..
Item 2 -- Length: 8 Offset: 8144 (0x1fd0) Flags: NORMAL
Block Id: 2756 linp Index: 1 Size: 8
Has Nulls: 0 Has Varwidths: 0
1fd0: 0000c40a 01000800 ........
Item 3 -- Length: 24 Offset: 8120 (0x1fb8) Flags: NORMAL
Block Id: 6231 linp Index: 1 Size: 24
Has Nulls: 32768 Has Varwidths: 0
1fb8: 00005718 01001880 01000000 00000000 ..W.............
1fc8: 4009cc7f 73230100 @...s#..
<Special Section> -----
BTree Index Section:
Flags: 0x0000 ()
Blocks: Previous (6109) Next (6305) Level (1) CycleId (0)
1ff0: dd170000 a1180000 01000000 00000000 ................
*** End of Requested Range Encountered. Last Block Read: 6246 ***
Notice how item 2 has size 8, but regular entries have size 24. I know
this is related to the high key of this page, but I can't quite figure
out why the short entry is 2 not 1. Is item 2 just assumed to be
greater than the previous' page high key?
Page's 6109 high key is:
Item 1 -- Length: 24 Offset: 8152 (0x1fd8) Flags: NORMAL
Block Id: 6101 linp Index: 1 Size: 24
Has Nulls: 32768 Has Varwidths: 0
1fd8: 0000d517 01001880 01000000 00000000 ................
1fe8: 8004f17d 6f230100 ...}o#..
Note that the data values are integer timestamp without time zone in
little endian byte order.
(The "Has Nulls" bit is somewhat bogus -- it displays 32768 when the
0x8000 bit is on, which is rather surprising. I'd expect it to display
1).
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes:
I'm chasing an apparent index corruption problem, and I came across
something I can't quite explain in pg_filedump. Say I dump a non-leaf
btree index page:
I think this is actually OK. Remember that in a non-rightmost page,
item 1 is the high key not a data entry. On the other hand, in a
non-leaf page, we don't bother to store the key for the first downlink
entry, since the associated key is really "minus infinity". Cf
nbtree/README:
On a non-leaf page, the data items are down-links to child pages with
bounding keys. The key in each data item is the *lower* bound for
keys on that child page, so logically the key is to the left of that
downlink. The high key (if present) is the upper bound for the last
downlink. The first data item on each such page has no lower bound
--- or lower bound of minus infinity, if you prefer. The comparison
routines must treat it accordingly. The actual key stored in the
item is irrelevant, and need not be stored at all. This arrangement
corresponds to the fact that an L&Y non-leaf page has one more pointer
than key.
So item 2 doesn't have a key in it. The other two items have null
keys, which means they need a null bitmap. I don't however understand
why there seems to be data as well as a null bitmap in there --- is
this perhaps a two-column index?
(The "Has Nulls" bit is somewhat bogus -- it displays 32768 when the
0x8000 bit is on, which is rather surprising. I'd expect it to display
1).
Yeah, I noticed that too. Made a note to myself to fix it in the next
revision of pg_filedump, which I suppose I'd better get on with
producing...
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I'm chasing an apparent index corruption problem, and I came across
something I can't quite explain in pg_filedump. Say I dump a non-leaf
btree index page:I think this is actually OK. Remember that in a non-rightmost page,
item 1 is the high key not a data entry. On the other hand, in a
non-leaf page, we don't bother to store the key for the first downlink
entry, since the associated key is really "minus infinity". Cf
nbtree/README:On a non-leaf page, the data items are down-links to child pages with bounding keys. The key in each data item is the *lower* bound for keys on that child page, so logically the key is to the left of that downlink. The high key (if present) is the upper bound for the last downlink. The first data item on each such page has no lower bound --- or lower bound of minus infinity, if you prefer. The comparison routines must treat it accordingly. The actual key stored in the item is irrelevant, and need not be stored at all. This arrangement corresponds to the fact that an L&Y non-leaf page has one more pointer than key.
Ahh, I had forgotten that bit completely. Thanks.
So item 2 doesn't have a key in it. The other two items have null
keys, which means they need a null bitmap.
Correct.
I don't however understand
why there seems to be data as well as a null bitmap in there --- is
this perhaps a two-column index?
Eh, yeah, it's a two column index, so it's OK.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I'm chasing an apparent index corruption problem, and I came across
something I can't quite explain in pg_filedump. Say I dump a non-leaf
btree index page:I think this is actually OK. Remember that in a non-rightmost page,
item 1 is the high key not a data entry.
Oh, BTW, this is not what's corrupted about this index -- I just had
trouble following what pg_filedump was reporting. The corruption is
more subtle: vacuum cannot find the parent page when trying to mark a
page for deletion.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.