right sibling is not next child
Apologies if this is a duplicate, but my original post stalled and I
noticed I had omitted the postgres version, which you will want.
I'm reporting this as a PostgreSQL bug because it involves an index
corruption. I can't see any other way our application should be able to
corrupt an index. I will attach the tail of the log when the corruption
was detected (and the postmaster shut itself down), as well as the
subsequent attempt to start. Fortunately we run our web site off of a
farm of four database servers, so we are taking one of the others out of
the mix, stopping postmaster, and copying its data directory over to
this machine for recovery, so we don't need advice on that aspect of
things; but, we'd like to do what we can to help track down the cause to
prevent a recurrence. We have renamed the data directory to make room
for recovery at the normal location, but otherwise the failing data
directory structure is unmodified.
For context, this is running on Windows 2003 Server. Eight Xeon box,
no HT, 6 GB RAM, 13 drive 15,000 RPM RAID5 array through battery backed
controller for everything. This database is about 180 GB with about 300
tables. We are running 8.1.3 modified with a patch we have submitted
(pending review last I saw) to implement the standard_conforming_strings
TODO. We have autovacuum running every ten seconds because of a few
very small tables with very high update rates, and we have a scheduled
VACUUM ANALYZE VERBOSE every night. It appears that last night's vacuum
found the problem, which the previous night's vacuum didn't. We had
some event which started at 14:25 yesterday which persisted until we
restarted the middle tier at 15:04. The symptom was that a fraction of
the queries which normally run in a few ms were timing out on a 20
second limit. pg_locks showed no blocking. We've been getting episodes
with these symptoms occassionally, but they have only lasted a minute or
two; this duration was unusual. We haven't identified a cause. One odd
thing is that with the number of queries per second that we run, the
number of timeouts during an episode is too small to support the notion
that _all_ similar queries are failing.
How best to proceed?
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
[2006-04-06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)> PANIC: right sibling is not next child in "Panel_pkey"
This should be repeatable by re-attempting a VACUUM, right? Please find
out which page exactly it's unhappy about (either gdb the crash or add a
printout of the "parent" variable to the elog call in nbtpage.c), then
pg_filedump the index and look to see what the index contains.
regards, tom lane
Right now the postmaster refuses to start. What is the best way to get
past that to try what you suggest?
[2006-04-06 07:22:50.347 ] 3984 LOG: database system was interrupted
while in recovery at 2006-04-06 02:19:59 Central Daylight Time
[2006-04-06 07:22:50.347 ] 3984 HINT: This probably means that some
data is corrupted and you will have to use the last backup for
recovery.
[2006-04-06 07:22:50.347 ] 3984 LOG: checkpoint record is at
F6/50052F10
[2006-04-06 07:22:50.347 ] 3984 LOG: redo record is at F6/50052F10;
undo record is at 0/0; shutdown FALSE
[2006-04-06 07:22:50.347 ] 3984 LOG: next transaction ID: 313381941;
next OID: 2321597
[2006-04-06 07:22:50.347 ] 3984 LOG: next MultiXactId: 1; next
MultiXactOffset: 0
[2006-04-06 07:22:50.347 ] 3984 LOG: database system was not properly
shut down; automatic recovery in progress
[2006-04-06 07:22:50.347 ] 3984 LOG: redo starts at F6/50052F58
[2006-04-06 07:22:50.347 ] 4076 <bigbird bigbird 127.0.0.1(2236)>
FATAL: the database system is starting up
[2006-04-06 07:22:50.347 ] 4064 <bigbird bigbird 127.0.0.1(2235)>
FATAL: the database system is starting up
[2006-04-06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block
-Kevin
On Thu, Apr 6, 2006 at 12:40 pm, in message
<25774.1144345220@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
[2006- 04- 06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)>
PANIC: right
sibling is not next child in "Panel_pkey"
This should be repeatable by re- attempting a VACUUM, right? Please
find
out which page exactly it's unhappy about (either gdb the crash or
add a
printout of the "parent" variable to the elog call in nbtpage.c),
then
Show quoted text
pg_filedump the index and look to see what the index contains.
regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Right now the postmaster refuses to start. What is the best way to get
past that to try what you suggest?
[2006-04-06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block
Hm, did this start happening immediately after the other problem?
That would suggest that you've got worse problems than just a corrupt
index. You weren't by any chance running with full_page_writes = off
were you?
You could get past the startup failure with pg_resetxlog, but it's not
clear whether you'd have a consistent database afterward. What I'd
suggest first is saving a copy of the entire $PGDATA tree for forensic
purposes (not to mention being able to go back to that state if you need
to).
Is there any chance of letting someone else have a look at the database
contents? Otherwise you're going to have to do your own sleuthing to
figure out what went wrong ...
regards, tom lane
On Thu, Apr 6, 2006 at 12:57 pm, in message
<25913.1144346246@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Right now the postmaster refuses to start. What is the best way to
get
past that to try what you suggest?
[2006- 04- 06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block
Hm, did this start happening immediately after the other problem?
This started happening on the first attempt to start the postmaster
after the other error, which left the postmaster down, apparently after
a failed restart attempt.
That would suggest that you've got worse problems than just a
corrupt
index. You weren't by any chance running with full_page_writes =
off
were you?
Yes we were. Apparently I have misunderstood the implications of this.
Somehow I had convinced myself that this setting was relatively safe in
our environment, due to our battery-backed controllers. I'd convinced
myself, after reading carefully through the documentation of this
setting, that I would be OK as long as that functioned correctly, and
have problems regardless of this setting if it didn't. If you show me
where I went wrong, maybe I can suggest a patch to the docs to prevent
others from going down the wrong path in this regard. (Of course, maybe
it's all there and I just had a bad day when I thought this through.)
You could get past the startup failure with pg_resetxlog, but it's
not
clear whether you'd have a consistent database afterward. What I'd
suggest first is saving a copy of the entire $PGDATA tree for
forensic
purposes
We already have this forensic copy and a replacement production copy on
this box. I think we'll need to copy to another box to get a second
forensic copy, to avoid risking an out-of-space condition. That can be
done, but it'll take a few hours.
(not to mention being able to go back to that state if you need
to).
That's not an issue for production purposes.
Is there any chance of letting someone else have a look at the
database
contents?
There is a lot of data in the database which is confidential by law.
I'd have to jump through a lot of hoops to get anyone to even consider
letting me ship it off site. If you're asking whether you could access
in to our site, that might be arranged.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
You weren't by any chance running with full_page_writes = off
were you?
Yes we were. Apparently I have misunderstood the implications of this.
So had we all :-(. It just plain doesn't work in 8.1.*, and will be
disabled in 8.1.4 --- see discussion last week. It might or might not
get resurrected in 8.2, depending on how messy it seems to be to fix.
Anyway, that explains your "heap_clean_redo: no block" failure. I think
you're stuck risking a pg_resetxlog to try to get back into the
database. If that results in a hopelessly corrupt database, we can try
modifying the WAL replay code to not consider this a fatal error, and
see if that produces anything we can use for debugging. I'm glad this
isn't your only copy of the database ...
There is a lot of data in the database which is confidential by law.
I'd have to jump through a lot of hoops to get anyone to even consider
letting me ship it off site. If you're asking whether you could access
in to our site, that might be arranged.
It sounds like the DB is too big to consider shipping anywhere anyway.
As long as you're comfortable doing stuff like pg_filedump and modifying
the code to get more debug info, we can proceed without getting into the
question of remote access.
regards, tom lane
On Thu, Apr 6, 2006 at 1:26 pm, in message
<26154.1144348014@sss.pgh.pa.us>,
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
You weren't by any chance running with full_page_writes = off
were you?Yes we were. Apparently I have misunderstood the implications of
this.
So had we all :- (. It just plain doesn't work in 8.1.*, and will
be
disabled in 8.1.4 --- see discussion last week.
Dang! I've not been able to keep up with the lists, and I missed that
thread. I'll change that setting as soon as possible.
Anyway, that explains your "heap_clean_redo: no block" failure. I
think
you're stuck risking a pg_resetxlog to try to get back into the
database. If that results in a hopelessly corrupt database, we can
try
modifying the WAL replay code to not consider this a fatal error,
and
see if that produces anything we can use for debugging.
Will do. Before I do that, though, is it worth making a copy? (The
down side is primarily the delay of copying it.)
I'm glad this isn't your only copy of the database ...
We go beyond being a belt-and-suspenders shop. Think staples and
glue-gun, too. ;-)
This is one of four copies of what is redundant data to start with, and
we have backups. So recovery is no sweat, but we got emails about
timeouts from the public during this incident, so the underlying bug is
important to us.
As long as you're comfortable doing stuff like pg_filedump and
modifying
the code to get more debug info, we can proceed without getting into
the
question of remote access.
OK. If you later think we do need to go that direction, send me an
email off-list.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anyway, that explains your "heap_clean_redo: no block" failure. I think
you're stuck risking a pg_resetxlog to try to get back into the
database.
Will do. Before I do that, though, is it worth making a copy? (The
down side is primarily the delay of copying it.)
Yeah, I think you'd better --- pg_resetxlog is destructive, and if we
find the database unusable for debugging afterward, we'll need that
copy.
regards, tom lane
Sorry about the delay in responding. We had a bit of difficulty with
the test machine. Kevin is also on vacation this week.
The problem is repeatable with a VACUUM. I've found the offending
block. A (partial) pg_filedump of that block is pasted in below. I'm a
little lost as to what the next step is though. Any pointers would be
appreciated.
Pete
"Kevin Grittner" <Kevin ( dot ) Grittner ( at ) wicourts ( dot ) gov>
writes:
[2006-04-06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)>
PANIC: right sibling is not next child in "Panel_pkey"
This should be repeatable by re-attempting a VACUUM, right? Please
find
out which page exactly it's unhappy about (either gdb the crash or add
a
printout of the "parent" variable to the elog call in nbtpage.c), then
pg_filedump the index and look to see what the index contains.
regards, tom lane
PANIC: right sibling is not next child in "Panel_pkey", parent is 271
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 180571
* Options used: -R 271 -i
*
* Dump created on: Tue Apr 11 13:38:44 2006
*******************************************************************
Block 271 ********************************************************
<Header> -----
Block Offset: 0x0021e000 Offsets: Lower 468 (0x01d4)
Block: Size 8192 Version 3 Upper 1952 (0x07a0)
LSN: logid 246 recoff 0x265d47c0 Special 8176 (0x1ff0)
Items: 112 Free Space: 1484
Length (including item array): 472
<Data> ------
Item 1 -- Length: 56 Offset: 8120 (0x1fb8) Flags: USED
Block Id: 132 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 2 -- Length: 8 Offset: 8112 (0x1fb0) Flags: USED
Block Id: 201 linp Index: 1 Size: 8
Has Nulls: 0 Has Varwidths: 0
Item 3 -- Length: 56 Offset: 8056 (0x1f78) Flags: USED
Block Id: 257 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
... snip ...
Item 109 -- Length: 56 Offset: 4920 (0x1338) Flags: USED
Block Id: 121 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 110 -- Length: 56 Offset: 4864 (0x1300) Flags: USED
Block Id: 133 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 111 -- Length: 56 Offset: 4808 (0x12c8) Flags: USED
Block Id: 134 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 112 -- Length: 56 Offset: 4752 (0x1290) Flags: USED
Block Id: 137 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
<Special Section> -----
BTree Index Section:
Flags: 0x0000 ()
Blocks: Previous (167) Next (455) Level (1)
*** End of Requested Range Encountered. Last Block Read: 271 ***
Import Notes
Resolved by subject fallback
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
PANIC: right sibling is not next child in "Panel_pkey", parent is 271
Hmm ... that's not actually enough info to tell us where to look,
is it :-(. Please add the following variables to the elog message, or
gdb for them if you can:
target
rightsib
nextoffset
regards, tom lane
Try as I might, I wasn't able to get a JIT debugger give me a memory
dump. It seems like postgres.exe is not really crashing in the
"unhandled exception" sense (see gdb log below)? Am I missing a
configure option?
(As an aside, what's the best way to get a core dump on Windows? Can
gdb read memory dumps produced by NTSD?)
I was able to attach to the running backend using gdb. That worked
(kind of, see log below).
I ended up modifying the elog again with the following results:
PANIC: right sibling is not next child in "Panel_pkey", parent is 271,
target is 635, rightsib is 629, nextoffset is 87
C:\WINDOWS\system32>C:\mingw\bin\gdb C:\pgsql\bin\postgres.exe
... snip ...
(gdb) attach 2084
Attaching to program `C:\pgsql\bin\postgres.exe', process 2084
[Switching to thread 2084.0x930]
(gdb) break nbtpage.c:983
Breakpoint 1 at 0x41f042: file nbtpage.c, line 983.
(gdb) continue
Continuing.
[Switching to thread 2084.0x81c]
Breakpoint 1, _bt_pagedel (rel=0x196bbb0, buf=1089, vacuum_full=0
'\0')
at nbtpage.c:983
983 nbtpage.c: No such file or directory.
in nbtpage.c
(gdb) print parent
$1 = 271
(gdb) print target
$2 = 635
(gdb) print rightsib
No symbol "rightsib" in current context.
(gdb) print nextoffset
$3 = 87
(gdb) print leftsib
$4 = 636
(gdb) print rightsib
No symbol "rightsib" in current context.
(gdb) continue
Continuing.
Program exited with code 03.
(gdb)
Pete
Tom Lane <tgl@sss.pgh.pa.us> 04/11/06 9:19 pm >>>
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
PANIC: right sibling is not next child in "Panel_pkey", parent is
271
Hmm ... that's not actually enough info to tell us where to look,
is it :-(. Please add the following variables to the elog message, or
gdb for them if you can:
target
rightsib
nextoffset
regards, tom lane
---------------------------(end of
broadcast)---------------------------
TIP 6: explain analyze is your friend
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
I ended up modifying the elog again with the following results:
PANIC: right sibling is not next child in "Panel_pkey", parent is 271,
target is 635, rightsib is 629, nextoffset is 87
OK, so the part of the pg_filedump info we need to see is items 86/87
in block 271, plus a few around them for good measure.
Are the values of the keys in this index sensitive data, or are they
just serial numbers of some kind? If we could see pg_dump -i -f rather
than just -i, it might help. I'd like to see the dumps of pages 635,
636, 629 as well (mostly their "special space" contents).
regards, tom lane
The index data isn't sensitive, but I should ask for permission
nonetheless. I'll send over the '-f' output tomorrow morning.
Pete
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 180571
* Options used: -i -R 271
*
* Dump created on: Tue Apr 11 18:22:24 2006
*******************************************************************
Block 271 ********************************************************
<Header> -----
Block Offset: 0x0021e000 Offsets: Lower 468 (0x01d4)
Block: Size 8192 Version 3 Upper 1952 (0x07a0)
LSN: logid 246 recoff 0x265d47c0 Special 8176 (0x1ff0)
Items: 112 Free Space: 1484
Length (including item array): 472
<Data> ------
... snip ...
Item 80 -- Length: 56 Offset: 3352 (0x0d18) Flags: USED
Block Id: 581 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 81 -- Length: 56 Offset: 3128 (0x0c38) Flags: USED
Block Id: 580 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 82 -- Length: 56 Offset: 2848 (0x0b20) Flags: USED
Block Id: 606 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 83 -- Length: 56 Offset: 2736 (0x0ab0) Flags: USED
Block Id: 608 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 84 -- Length: 56 Offset: 2792 (0x0ae8) Flags: USED
Block Id: 601 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 85 -- Length: 56 Offset: 2120 (0x0848) Flags: USED
Block Id: 640 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 86 -- Length: 56 Offset: 2176 (0x0880) Flags: USED
Block Id: 635 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 87 -- Length: 56 Offset: 2232 (0x08b8) Flags: USED
Block Id: 636 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 88 -- Length: 56 Offset: 2288 (0x08f0) Flags: USED
Block Id: 635 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 89 -- Length: 56 Offset: 2400 (0x0960) Flags: USED
Block Id: 629 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 90 -- Length: 56 Offset: 5704 (0x1648) Flags: USED
Block Id: 166 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 91 -- Length: 56 Offset: 5648 (0x1610) Flags: USED
Block Id: 339 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 92 -- Length: 56 Offset: 5592 (0x15d8) Flags: USED
Block Id: 372 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 93 -- Length: 56 Offset: 4416 (0x1140) Flags: USED
Block Id: 480 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 94 -- Length: 56 Offset: 5536 (0x15a0) Flags: USED
Block Id: 208 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
... snip ...
<Special Section> -----
BTree Index Section:
Flags: 0x0000 ()
Blocks: Previous (167) Next (455) Level (1)
*** End of Requested Range Encountered. Last Block Read: 271 ***
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 180571
* Options used: -i -R 635
*
* Dump created on: Tue Apr 11 18:22:43 2006
*******************************************************************
Block 635 ********************************************************
<Header> -----
Block Offset: 0x004f6000 Offsets: Lower 24 (0x0018)
Block: Size 8192 Version 3 Upper 8120 (0x1fb8)
LSN: logid 239 recoff 0xcabae788 Special 8176 (0x1ff0)
Items: 1 Free Space: 8096
Length (including item array): 28
<Data> ------
Item 1 -- Length: 56 Offset: 8120 (0x1fb8) Flags: USED
Block Id: 452 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
<Special Section> -----
BTree Index Section:
Flags: 0x0001 (LEAF)
Blocks: Previous (636) Next (629) Level (0)
*** End of Requested Range Encountered. Last Block Read: 635 ***
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 180571
* Options used: -i -R 636
*
* Dump created on: Tue Apr 11 18:22:50 2006
*******************************************************************
Block 636 ********************************************************
<Header> -----
Block Offset: 0x004f8000 Offsets: Lower 24 (0x0018)
Block: Size 8192 Version 3 Upper 8120 (0x1fb8)
LSN: logid 245 recoff 0x9bfa3b60 Special 8176 (0x1ff0)
Items: 1 Free Space: 8096
Length (including item array): 28
<Data> ------
Item 1 -- Length: 56 Offset: 8120 (0x1fb8) Flags: USED
Block Id: 454 linp Index: 27 Size: 56
Has Nulls: 0 Has Varwidths: 16384
<Special Section> -----
BTree Index Section:
Flags: 0x0001 (LEAF)
Blocks: Previous (640) Next (635) Level (0)
*** End of Requested Range Encountered. Last Block Read: 636 ***
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 180571
* Options used: -i -R 629
*
* Dump created on: Tue Apr 11 18:23:01 2006
*******************************************************************
Block 629 ********************************************************
<Header> -----
Block Offset: 0x004ea000 Offsets: Lower 244 (0x00f4)
Block: Size 8192 Version 3 Upper 5040 (0x13b0)
LSN: logid 239 recoff 0xcabae788 Special 8176 (0x1ff0)
Items: 56 Free Space: 4796
Length (including item array): 248
<Data> ------
... snip ...
<Special Section> -----
BTree Index Section:
Flags: 0x0001 (LEAF)
Blocks: Previous (635) Next (166) Level (0)
*** End of Requested Range Encountered. Last Block Read: 629 ***
Tom Lane <tgl@sss.pgh.pa.us> 04/12/06 1:06 am >>>
OK, so the part of the pg_filedump info we need to see is items 86/87
in block 271, plus a few around them for good measure.
Are the values of the keys in this index sensitive data, or are they
just serial numbers of some kind? If we could see pg_dump -i -f
rather
than just -i, it might help. I'd like to see the dumps of pages 635,
636, 629 as well (mostly their "special space" contents).
regards, tom lane
Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it
actually doesn't even get to Panel and errors out with:
INFO: analyzing "public.MaintCode"
INFO: "MaintCode": scanned 1 of 1 pages, containing 19 live rows and 0
dead rows; 19 rows in sample, 19 estimated total rows
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
MaintCode is a tiny, static table. Does that indicate further
corruption or is there a more benign explanation? Could these errors
not be Postgres' fault? (e.g. hardware-related)
Pete
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it
actually doesn't even get to Panel and errors out with:
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
Hm, my eyebrows just disappeared over the back of my head somewhere.
Is that repeatable? Does a REINDEX on pg_statistic make it go away?
regards, tom lane
It is repeatable. A reindex doesn't work.
Pete
bigbird=# vacuum analyze "MaintCode";
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
bigbird=# vacuum analyze verbose "MaintCode";
INFO: vacuuming "public.MaintCode"
INFO: index "MaintCode_pkey" now contains 19 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "MaintCode": found 0 removable, 19 nonremovable row versions in
1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_90472"
INFO: index "pg_toast_90472_index" now contains 0 row versions in 1
pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_90472": found 0 removable, 0 nonremovable row versions
in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.MaintCode"
INFO: "MaintCode": scanned 1 of 1 pages, containing 19 live rows and 0
dead rows; 19 rows in sample, 19 estimated total rows
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
bigbird=# reindex table pg_statistic;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
bigbird=# select * from "MaintCode";
maintCode | colorAttrb | descr | isActive |
maintPriority | stopsTheClock | videoAttrb | wcisSpecStatus
-----------+------------+---------------------------+----------+---------------+---------------+------------+----------------
AA | R | Annual accounts | t |
75 | t | N |
AP | R | Appeal | t |
30 | f | N |
AW | R | Awaiting Papers | t |
80 | f | N |
BA | R | Bankruptcy | t |
20 | t | N | S05
CI | R | Case Inactive | t |
60 | f | N | S03
CS | R | Consent decree | t |
90 | t | N |
DP | R | Deferred judgt./prosecute | t |
40 | t | N | S07
EF | R | Electronic filing | t |
33 | f | N |
ES | R | Extension | t |
55 | t | N |
EX | R | Expungement | t |
5 | f | N |
FX | R | Future expungement | t |
93 | f | N |
IN | R | Interpreter | t |
53 | f | N |
JR | R | Judicial review | t |
75 | t | N |
RO | R | Reopen | t |
35 | f | N |
SL | R | Sealed Record | t |
85 | f | N |
SM | R | Sentence modification | t |
43 | f | N |
SU | R | Case suspended | t |
45 | t | N |
UA | R | Under Advisement | t |
50 | f | N | S02
WA | R | Warrant issued | t |
10 | t | F | S04
(19 rows)
bigbird=#
Tom Lane <tgl@sss.pgh.pa.us> 04/12/06 5:00 am >>>
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it
actually doesn't even get to Panel and errors out with:
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
Hm, my eyebrows just disappeared over the back of my head somewhere.
Is that repeatable? Does a REINDEX on pg_statistic make it go away?
regards, tom lane
---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
Import Notes
Reply to msg id not found: 443BF8
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
bigbird=# vacuum analyze "MaintCode";
ERROR: duplicate key violates unique constraint
"pg_statistic_relid_att_index"
Hm, can you see any rows in pg_statistic with duplicate values of
(starelid, staattnum)? If so it'd be useful to look at their
ctid/xmin/xmax/cmin/cmax values.
regards, tom lane
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
Item 85 -- Length: 56 Offset: 2120 (0x0848) Flags: USED
Block Id: 640 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 86 -- Length: 56 Offset: 2176 (0x0880) Flags: USED
Block Id: 635 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 87 -- Length: 56 Offset: 2232 (0x08b8) Flags: USED
Block Id: 636 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 88 -- Length: 56 Offset: 2288 (0x08f0) Flags: USED
Block Id: 635 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 89 -- Length: 56 Offset: 2400 (0x0960) Flags: USED
Block Id: 629 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Item 90 -- Length: 56 Offset: 5704 (0x1648) Flags: USED
Block Id: 166 linp Index: 1 Size: 56
Has Nulls: 0 Has Varwidths: 16384
Well, that's pretty dang interesting. How did block 635 get to be
listed twice? The sibling links say that the correct sequence is
640, 636, 635, 629, 166 ... so something screwed up the parent level's
keys.
What would be most useful at this point is to look at the keys in
these entries, and compare them to the "high keys" (item 1) of the
individual leaf pages. I'm wondering what key is in that extra
entry for 635 ... Did you get permission to show us the keys?
regards, tom lane
I can't find any duplicates?!?
The query
select starelid, staattnum, ctid, xmin, xmax, cmin, cmax
from pg_statistic p1
where (select count(*) from pg_statistic p2 where
p1.starelid = p2.starelid and p1.staattnum = p2.staattnum) > 1
doesn't turn up anything. Nor does dumping
select starelid, staattnum from pg_statistic
to a file and using sort | uniq -c
Pete
Tom Lane <tgl@sss.pgh.pa.us> 04/12/06 4:37 pm >>>
Hm, can you see any rows in pg_statistic with duplicate values of
(starelid, staattnum)? If so it'd be useful to look at their
ctid/xmin/xmax/cmin/cmax values.
regards, tom lane
Import Notes
Reply to msg id not found: 443BF8
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
I can't find any duplicates?!?
Weirder and weirder. Maybe the table is OK but the index is corrupt?
Could it be another symptom of the same problem we're seeing in the
Panel_pkey index? I'm currently theorizing that that index might've
been corrupted during WAL replay (specifically, if the replayer somehow
failed to recognize a split completion, btree_xlog_cleanup would've
inserted an extra entry) and one could certainly imagine multiple
indexes getting corrupted in the same fashion at the same time. You
did say that this problem showed up shortly after a database crash,
right?
Could you send me a copy of pg_statistic_relid_att_index, off-list
(the actual disk file, not a pg_filedump)? There's nothing but table
OIDs and attribute numbers in it, so I can't see any reason anyone
would consider it sensitive data. I've got some really crude code
laying about for scanning an index and looking for inconsistencies ...
it's too ugly to give out, but I'd like to see if it can find anything
wrong with that index. (I'll probably ask for a copy of Panel_pkey
for the same purpose, if you get permission to show us its keys.)
regards, tom lane