3rd time is a charm.....right sibling is not next child crash.

Started by Jeff Amielover 15 years ago13 messages
#1Jeff Amiel
becauseimjeff@yahoo.com

Not looking for help...just putting some data out there.

2 previous crashes caused by corrupt slony indexes

http://archives.postgresql.org/pgsql-general/2010-02/msg00022.php

http://archives.postgresql.org/pgsql-general/2009-12/msg01172.php

New one yesterday.

Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT 9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx"

We are on the eve of switching off our SAN to some direct attached storage and upgrading postgres and slony in the process this weekend....so any thoughts that it might be hardware, driver or even postgres/slony should be alleviated by the fact that everything is changing.

That being said, the fact that each time this has happened, it has been a slony index that has been corrupt, I find it 'odd'. While I can't imagine a bug in slony corrupting postgres indexes...and I can't imagine a bug in postgres corrupting only slony indexes, I don't really know what to think. Just putting this out there in case anyone has similar issues or can use this data in some meaningful way.

Stack trace looks similar to last time.

Program terminated with signal 6, Aborted.
#0 0xfecba227 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0 0xfecba227 in _lwp_kill () from /lib/libc.so.1
#1 0xfecb598f in thr_kill () from /lib/libc.so.1
#2 0xfec61ed3 in raise () from /lib/libc.so.1
#3 0xfec41d0d in abort () from /lib/libc.so.1
#4 0x0821b8a6 in errfinish (dummy=0) at elog.c:471
#5 0x0821c74b in elog_finish (elevel=22, fmt=0x82b7780 "right sibling %u of block %u is not next child of %u in index \"%s\"") at elog.c:964
#6 0x0809e1a0 in _bt_pagedel (rel=0x867bcd8, buf=139905, stack=0x86b3768, vacuum_full=0 '\0') at nbtpage.c:1141
#7 0x0809f835 in btvacuumscan (info=0x8043f70, stats=0x86b5c30, callback=0, callback_state=0x0, cycleid=29488) at nbtree.c:936
#8 0x0809fc65 in btbulkdelete (fcinfo=0x0) at nbtree.c:547
#9 0x0821f424 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at fmgr.c:1215
#10 0x0809a89f in index_bulk_delete (info=0x8043f70, stats=0x0, callback=0x812ffc8 <lazy_tid_reaped>, callback_state=0x86b5818) at indexam.c:573
#11 0x0812ff54 in lazy_vacuum_index (indrel=0x867bcd8, stats=0x86b5b70, vacrelstats=0x86b5818) at vacuumlazy.c:660
#12 0x0813055a in lazy_vacuum_rel (onerel=0x867b7f8, vacstmt=0x86659b8) at vacuumlazy.c:487
#13 0x0812e910 in vacuum_rel (relid=140925368, vacstmt=0x86659b8, expected_relkind=114 'r') at vacuum.c:1107
#14 0x0812f95a in vacuum (vacstmt=0x86659b8, relids=0x8665bc0) at vacuum.c:400
#15 0x08186e16 in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914
#16 0x08187278 in autovac_start () at autovacuum.c:178
#17 0x0818bfed in ServerLoop () at postmaster.c:1252
#18 0x0818d16d in PostmasterMain (argc=3, argv=0x833adc8) at postmaster.c:966
#19 0x08152cce in main (argc=3, argv=0x833adc8) at main.c:188
(gdb)

#2Vick Khera
vivek@khera.org
In reply to: Jeff Amiel (#1)
Re: 3rd time is a charm.....right sibling is not next child crash.

On Tue, Jun 8, 2010 at 9:26 AM, Jeff Amiel <becauseimjeff@yahoo.com> wrote:

That being said, the fact that each time this has happened, it has been a slony index that has been corrupt, I find it 'odd'.  While I can't imagine a bug in slony corrupting postgres indexes...and I can't imagine a bug in postgres corrupting only slony indexes, I don't really know what to think.  Just putting this out there in case anyone has similar issues or can use this data in some meaningful way.

Slony is just a client to postgres, nothing more, nothing less. Any
index corruption is going to be the fault of Postgres server or
hardware.

We don't know what version of postgres you're running, so nobody can
say anything for sure.

I will say that we observed an index corruption (on a zero-row
"status" table, so it grows and changes a lot during the course of a
day) about a week ago. It was not a slony table. A quick reindex on
that table and we were off and running. We noticed it because slony
was trying to insert into the table and getting a read error. This
has happened to us maybe one other time in the 10+ years this
application has been running.

I'm on Pg 8.3.7.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Amiel (#1)
Re: 3rd time is a charm.....right sibling is not next child crash.

Jeff Amiel <becauseimjeff@yahoo.com> writes:

New one yesterday.

Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT 9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx"

In your original report you mentioned that the next autovacuum attempt
on the same table succeeded without incident. Has that been true each
time? I wonder whether this is some transient state, rather than actual
corruption that you need to REINDEX to recover from.

regards, tom lane

#4Jeff Amiel
jamiel@istreamimaging.com
In reply to: Tom Lane (#3)
Re: 3rd time is a charm.....right sibling is not next child crash.

On 6/8/10 11:23 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

In your original report you mentioned that the next autovacuum attempt
on the same table succeeded without incident. Has that been true each
time? I wonder whether this is some transient state, rather than actual
corruption that you need to REINDEX to recover from.

I didn't waste time during this event and reindexed as quick as I could.

I will note, however, that these indexes (specifically the one that
generated the error) were in use (successfully) AFTER the event BEFORE the
reindex by the slony triggers (by virtue of inserts into the log tables and
such) even though I stopped autovacuum and slon daemon. Not reads,
obviously...just inserts/updates.

If nobody else has seen/is seeing this, I will chalk this whole scenario up
to oddball SAN issues (we've logged many a write/read error over the year(s)
causing corruption on these heavily manipulated indexes. I'll be glad to
move to my attached storage as quickly as possible.

On a side note, I am 100% sure that autovacuum was disabled when I brought
the database back up after the core dump(s). However, minutes after
restarting, some of my larger tables started getting vacuumed by pgsql user.
Any way that a vacuum would kick off for a particular table (or series of
tables) even when autovacuum was off in the postgresql.conf? My only manual
vacuum process is kicked off late at night, so this was not it.

Also....before I had a chance to disable the slon daemon I also noticed
other slony tables being vacuum analyzed (even though autovacuum was off)
....Does Slony manage it's own vacuuming separate from postgres' autovacuum?

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Amiel (#4)
Re: 3rd time is a charm.....right sibling is not next child crash.

Jeff Amiel <jamiel@istreamimaging.com> writes:

On a side note, I am 100% sure that autovacuum was disabled when I brought
the database back up after the core dump(s). However, minutes after
restarting, some of my larger tables started getting vacuumed by pgsql user.
Any way that a vacuum would kick off for a particular table (or series of
tables) even when autovacuum was off in the postgresql.conf?

Anti-transaction-wraparound vacuums, perhaps?

regards, tom lane

#6Jeff Amiel
jamiel@istreamimaging.com
In reply to: Tom Lane (#5)
Re: 3rd time is a charm.....right sibling is not next child crash.

On 6/8/10 12:56 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Jeff Amiel <jamiel@istreamimaging.com> writes:

On a side note, I am 100% sure that autovacuum was disabled when I brought
the database back up after the core dump(s). However, minutes after
restarting, some of my larger tables started getting vacuumed by pgsql user.
Any way that a vacuum would kick off for a particular table (or series of
tables) even when autovacuum was off in the postgresql.conf?

Anti-transaction-wraparound vacuums, perhaps?

I would hope not. :)
This is postgres 8.2.X. Autovacuum has been enabled forever (seemingly with
no errors)
Anything I can look for ? (I searched the logs for references to "must be
vacuumed within" but found nothing....)

SELECT datname, age(datfrozenxid) FROM pg_database;

postgres 178649703

prod 204588751

template1 178653277

template0 178653131

#7Jaime Casanova
jaime@2ndquadrant.com
In reply to: Jeff Amiel (#4)
Re: 3rd time is a charm.....right sibling is not next child crash.

On Tue, Jun 8, 2010 at 12:49 PM, Jeff Amiel <jamiel@istreamimaging.com> wrote:

....Does Slony manage it's own vacuuming separate from postgres' autovacuum?

Yes it does: http://www.slony.info/documentation/maintenance.html

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

#8Jeff Amiel
jamiel@istreamimaging.com
In reply to: Jaime Casanova (#7)
Re: 3rd time is a charm.....right sibling is not next child crash.

On 6/8/10 1:15 PM, "Jaime Casanova" <jaime@2ndquadrant.com> wrote:

On Tue, Jun 8, 2010 at 12:49 PM, Jeff Amiel <jamiel@istreamimaging.com> wrote:

....Does Slony manage it's own vacuuming separate from postgres' autovacuum?

Yes it does: http://www.slony.info/documentation/maintenance.html

" It seems preferable to configure autovacuum to avoid vacuum
Slony-I-managed configuration tables. "

Hmmm....I don't do this.
Surely this is not relative to my corrupt indexes....2 attempted vacuums on
same indexes?

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jeff Amiel (#8)
Re: 3rd time is a charm.....right sibling is not next child crash.

Excerpts from Jeff Amiel's message of mar jun 08 14:19:02 -0400 2010:

" It seems preferable to configure autovacuum to avoid vacuum
Slony-I-managed configuration tables. "

Hmmm....I don't do this.
Surely this is not relative to my corrupt indexes....2 attempted vacuums on
same indexes?

Pretty unlikely.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jeff Amiel (#1)
Re: 3rd time is a charm.....right sibling is not next child crash.

Excerpts from Jeff Amiel's message of mar jun 08 09:26:25 -0400 2010:

Not looking for help...just putting some data out there.

2 previous crashes caused by corrupt slony indexes

http://archives.postgresql.org/pgsql-general/2010-02/msg00022.php

http://archives.postgresql.org/pgsql-general/2009-12/msg01172.php

New one yesterday.

Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT 9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx"

I've seen this problem (and others) in a high-load environment. Not
Slony related though.

I wrote a small tool to check btree index files for consistency problems
such as this one, by parsing pg_filedump output. I've seen strange
things such as index pointers pointing to pages that shouldn't have been
pointed to; mismatching sibling pointers; and others.

Do you have a copy of the broken index file?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11Jeff Amiel
jamiel@istreamimaging.com
In reply to: Alvaro Herrera (#10)
Re: 3rd time is a charm.....right sibling is not next child crash.

On 6/8/10 2:03 PM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote:

I've seen this problem (and others) in a high-load environment. Not
Slony related though.

I wrote a small tool to check btree index files for consistency problems
such as this one, by parsing pg_filedump output. I've seen strange
things such as index pointers pointing to pages that shouldn't have been
pointed to; mismatching sibling pointers; and others.

Do you have a copy of the broken index file?

Alas, no. But I have another 5 days to reproduce the problem before
changing hardware/versions (which will probably bring a whole new set of
'opportunities' to deal with). If it happens again, I will snag the index
file for analysis.

Any way you can share your tool source so I can proactively look for issues
in my indexes?

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#10)
Re: 3rd time is a charm.....right sibling is not next child crash.

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Jeff Amiel's message of mar jun 08 09:26:25 -0400 2010:

Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT 9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx"

I've seen this problem (and others) in a high-load environment. Not
Slony related though.

I wrote a small tool to check btree index files for consistency problems
such as this one, by parsing pg_filedump output. I've seen strange
things such as index pointers pointing to pages that shouldn't have been
pointed to; mismatching sibling pointers; and others.

I spent some more time today looking for possible causes of this (within
our code that is; the obvious elephant in the room is the possibility of
the disk storage system losing an update). I didn't find anything, but
it did occur to me that there is a simple way to ameliorate this
problem: we could rearrange the code in _bt_pagedel() so it checks for
this case before entering its critical section. Then, corruption of
this kind is at least only an ERROR not a PANIC.

Any objections to doing that?

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.

I wrote:

it did occur to me that there is a simple way to ameliorate this
problem: we could rearrange the code in _bt_pagedel() so it checks for
this case before entering its critical section. Then, corruption of
this kind is at least only an ERROR not a PANIC.

In particular, I propose the attached patch, which gets rid of
unnecessary PANIC cases in _bt_split as well as _bt_pagedel;
all of these get reported from the field every now and then.
In passing this also makes the error messages out of _bt_split
a bit more detailed.

FWIW, I think the original rationale for PANIC here was so we could
capture a core dump for study; but since no one has ever yet cooperated
by providing such a dump, it seems like not panicking is a better plan.

Barring objections, I plan to back-patch this as far as it will
conveniently go (looks like 8.2 or 8.3 or so).

regards, tom lane