Corrupted index

Started by Akash Gargalmost 21 years ago7 messagesgeneral

akash.garg@gmail.com

almost 21 years ago

During a vacuum, I ran into this error:
vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page
header in block 41661 of relation "friend_pkey"

I've read the posts on this newsgroup and it's clear that I have to
REINDEX to fix this. The bigger question is -- why did this happen in
the first place? I'm using open_sync as my WAL_SYNC_METHOD. Is it
better to use fsync?

Running Suse 9.0 on a Quad Opteron, with direct attached storage. The
RAID card is set to write through all of the writes, so I don't think
this is a case where the RAID card is lying about a commit.

-Akash

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Akash Garg (#1)

Re: Corrupted index

Akash Garg <akash.garg@gmail.com> writes:

During a vacuum, I ran into this error:
vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page
header in block 41661 of relation "friend_pkey"

I've read the posts on this newsgroup and it's clear that I have to
REINDEX to fix this. The bigger question is -- why did this happen in
the first place?

Good question. If you haven't REINDEXed yet, it would be worthwhile
to get a hex dump (use "od -x") of that page of the index first.
Speculation with no data to look at is a bit pointless ...

regards, tom lane

Akash Garg

akash.garg@gmail.com

almost 21 years ago

In reply to: Tom Lane (#2)

Re: Corrupted index

Tom,
I'll try running the od command -- I'm just a little confused on where
I run it. I tried running od -x 41661 but that doesn't yield any
results. I'm assuming I have to run this command on the actual index
file itself -- how do I do this?

Thanks,
Akash

Show quoted text

On 6/23/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Akash Garg <akash.garg@gmail.com> writes:

During a vacuum, I ran into this error:
vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page
header in block 41661 of relation "friend_pkey"

I've read the posts on this newsgroup and it's clear that I have to
REINDEX to fix this. The bigger question is -- why did this happen in
the first place?

Good question. If you haven't REINDEXed yet, it would be worthwhile
to get a hex dump (use "od -x") of that page of the index first.
Speculation with no data to look at is a bit pointless ...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Akash Garg (#3)

Re: Corrupted index

Akash Garg <akash.garg@gmail.com> writes:

I'll try running the od command -- I'm just a little confused on where
I run it. I tried running od -x 41661 but that doesn't yield any
results. I'm assuming I have to run this command on the actual index
file itself -- how do I do this?

See contrib/oid2name, or read the docs at
http://developer.postgresql.org/docs/postgres/storage.html
to learn how to figure out which file is the index. (In pre-8.0 PG,
ignore what those docs say about tablespaces, but the rest of the info
goes back a ways.) The short answer is you need the database's OID
from pg_database, and the index's relfilenode from pg_class.

regards, tom lane

Akash Garg

akash.garg@gmail.com

almost 21 years ago

In reply to: Tom Lane (#4)

Re: Corrupted index

I used oid2name to find the index files:
168807081
168807081.1
168807081.2
168807081.3
168807081.4

Now how do I run the od command to find the block in question?

Thanks,
Akash

Show quoted text

On 6/23/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Akash Garg <akash.garg@gmail.com> writes:

I'll try running the od command -- I'm just a little confused on where
I run it. I tried running od -x 41661 but that doesn't yield any
results. I'm assuming I have to run this command on the actual index
file itself -- how do I do this?

See contrib/oid2name, or read the docs at
http://developer.postgresql.org/docs/postgres/storage.html
to learn how to figure out which file is the index. (In pre-8.0 PG,
ignore what those docs say about tablespaces, but the rest of the info
goes back a ways.) The short answer is you need the database's OID
from pg_database, and the index's relfilenode from pg_class.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Akash Garg (#5)

Re: Corrupted index

Akash Garg <akash.garg@gmail.com> writes:

I used oid2name to find the index files:
168807081
168807081.1
168807081.2
168807081.3
168807081.4

Now how do I run the od command to find the block in question?

Rather than doing the math by hand, let dd do it:

dd bs=8k skip=41661 count=1 </path/to/indexfile | od -x

This assumes your PG was built with the default 8k block size, but
if it wasn't you'd presumably know that ...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Akash Garg (#1)

Re: Corrupted index

Akash Garg <akash.garg@gmail.com> writes:

Ok, I ran that command on the index files -- they are attached.

I'm a bit confused --- you mean you extracted block 41661 from each of
the index's segments? If so, only the first one is actually relevant
here.

I notice that in file2, file2 and file3, I notice a pattern like this:
0002660 8980 0020 8970 0020 8960 0020 8950 0020
0002700 8940 0020 8930 0020 8920 0020 8910 0020
0002720 0900 0020 0000 0000 0000 0000 0000 0000
0002740 0000 0000 0000 0000 0000 0000 0000 0000
*
0004400 000f 611c 003e 0010 70d9 0b69 0000 0000
0004420 000f 5f97 0003 0010 70d8 0b69 0000 0000
0004440 000f 6118 0077 0010 70d7 0b69 0000 0000
0004460 000f 6118 0075 0010 70d6 0b69 0000 0000

That's what it should look like --- "*" is od's notation for "more of
the same", in this case lines containing all zeroes. Those pages look
exactly like what I'd expect a PG index page to look like.

The file1 extract, however, is pure text and not PG data of any kind.
The first part of it looks like

00000000: 732c 205c 725c 6e77 6861 7420 646f 6573 s, \r\nwhat does
00000010: 2061 2068 756d 616e 6974 6172 6961 6e20 a humanitarian
00000020: 6561 743f 205c 725c 6e5c 725c 6e53 6f6d eat? \r\n\r\nSom
00000030: 6574 696d 6573 2c20 4920 7468 696e 6b20 etimes, I think
00000040: 616c 6c20 7468 6520 666f 6c6b 7320 7768 all the folks wh
00000050: 6f20 6772 6577 2075 7020 7370 6561 6b69 o grew up speaki
00000060: 6e67 2045 6e67 6c69 7368 205c 725c 6e73 ng English \r\ns
00000070: 686f 756c 6420 6265 2063 6f6d 6d69 7474 hould be committ
00000080: 6564 2074 6f20 616e 2061 7379 6c75 6d20 ed to an asylum
00000090: 666f 7220 7468 6520 7665 7262 616c 6c79 for the verbally
000000a0: 2069 6e73 616e 652e 205c 725c 6e5c 725c insane. \r\n\r\
000000b0: 6e49 6e20 7768 6174 206f 7468 6572 206c nIn what other l

and it goes downhill from there (something out of a spam folder maybe?)

What it looks like to me is that a page of an entirely unrelated file
got dropped into the Postgres index. This suggests either a disk drive
error (writing someplace else than it was commanded to) or a kernel
bug (writing the wrong buffer). I'd suggest some disk-drive testing
as well as checking into bug fixes for your kernel. I'm pretty well
convinced that this wasn't Postgres' fault.

regards, tom lane

Import Notes

Reply to msg id not found: 89058be205062311266cd1f7c8@mail.gmail.com