panic: index siblings mismatch

Started by Gerd Koenigabout 17 years ago3 messagesgeneral
Jump to latest
#1Gerd Koenig
koenig@transporeon.com

Hello,

today I recognized an error while playing with pgpool-II 2.1 and
postgresql 8.3.5 (on 64bit linux).
I really don't know if the error is caused by pgpool or if it's
postgresql-related...?!?

There are up to 200 active connections in parallel and thereby sometimes
high load (with high write:read ratio, peak level ~60% write).
Can this be a problem in such a usage scenario, are there some
limitations in pgpool-II <-> number of connections / load ???

any hints are much appreciated, since we don't know what happened...

thanks in advance...::GERD::...

LOG DETAILS:

a) postgresql log of second db node where the error occured
b) pgpool log

a)
==================================================================
2009-03-10 11:16:33 CET [11008]: [1-1] PANIC: right sibling's left-link
doesn't match: block 695 links to 785 instead of expected 694 in index
"PartyAddress_index"
2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT: INSERT INTO
"PartyAddress" ("Name1", "Name2", "Street1", "Street2", "
City", "Postcode", "Region", "Country", "Timezone", "Qualifier",
"Compressed", "PositionId") VALUES (E'Testing', E'', E'
Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1',
E'address.loading', FALSE, NULL); select lastval()
2009-03-10 11:16:33 CET [29026]: [3-1] LOG: server process (PID 11008)
was terminated by signal 6: Aborted
2009-03-10 11:16:33 CET [29026]: [4-1] LOG: terminating any other
active server processes
2009-03-10 11:16:33 CET [10978]: [5-1] WARNING: terminating connection
because of crash of another server process
2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL: The postmaster has
commanded this server process to roll back the curren
t transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2009-03-10 11:16:33 CET [10978]: [7-1] HINT: In a moment you should be
able to reconnect to the database and repeat you
r command.
2009-03-10 11:16:33 CET [10947]: [1-1] WARNING: terminating connection
because of crash of another server process
.....
.....
2009-03-10 11:16:33 CET [29026]: [6-1] LOG: all server processes
terminated; reinitializing
2009-03-10 11:16:33 CET [11015]: [1-1] LOG: database system was
interrupted; last known up at 2009-03-10 11:09:10 CET
2009-03-10 11:16:33 CET [11015]: [2-1] LOG: starting archive recovery
2009-03-10 11:16:33 CET [11015]: [3-1] LOG: restore_command = 'cp
/exchange/wal_from_iweblin/%f %p'
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
2009-03-10 11:16:33 CET [11015]: [4-1] LOG: automatic recovery in progress
2009-03-10 11:16:33 CET [11015]: [5-1] LOG: redo starts at 1/AABEBFA8
2009-03-10 11:16:33 CET [11015]: [6-1] LOG: unexpected pageaddr
0/36BEE000 in log file 1, segment 170, offset 12509184
2009-03-10 11:16:33 CET [11015]: [7-1] LOG: redo done at 1/AABED938
2009-03-10 11:16:33 CET [11015]: [8-1] LOG: last completed transaction
was at log time 2009-03-10 11:15:17.533913+01
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [9-1] LOG: selected new timeline ID: 4
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [10-1] LOG: archive recovery complete
2009-03-10 11:16:33 CET [11015]: [11-1] LOG: checkpoint starting:
shutdown immediate
2009-03-10 11:16:34 CET [11015]: [12-1] LOG: checkpoint complete: wrote
4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.005 s, sync=0.086 s, total=0.242 s
2009-03-10 11:16:34 CET [11023]: [1-1] LOG: autovacuum launcher started
2009-03-10 11:16:34 CET [29026]: [7-1] LOG: database system is ready to
accept connections
=======================================

b)
==========================================
2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58)
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51)
2009-03-10 11:16:31 LOG: pid 27107: statement: SELECT d.oid, datname
as name, pg_get_userbyid(datdba) as owner, ds.des
cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate,
datlastsysoid, datconfig, s.spcname as tblspace, d
atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN
pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab
lespace s ON d.dattablespace = s.oid WHERE datname = 'test'

2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing
the query
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend T NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: T
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend D NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: D
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend C NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: C
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend Z NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: Z
2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0
length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I
2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance
mode
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58)
2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5
2009-03-10 11:16:36 LOG: pid 27107: connection received:
host=172.28.1.126 port=1946
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679
database: user:
2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0
database: test user: test
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1
because backend_status = 3

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Gerd Koenig (#1)
Re: panic: index siblings mismatch

On Tue, Mar 10, 2009 at 3:35 PM, Gerd König <koenig@transporeon.com> wrote:

Hello,

today I recognized an error while playing with pgpool-II 2.1 and
postgresql 8.3.5 (on 64bit linux).
I really don't know if the error is caused by pgpool or if it's
postgresql-related...?!?
any hints are much appreciated, since we don't know what happened...

thanks in advance...::GERD::...

a) postgresql log of second db node where the error occured
b) pgpool log

a)
==================================================================
2009-03-10 11:16:33 CET [11008]: [1-1] PANIC:  right sibling's left-link
doesn't match: block 695 links to 785 instead of expected 694 in index
"PartyAddress_index"

This is a sign of index corruption. Have you had any unexpected power
outages lately? you could try reindexing the table that's affected
and see if the problem goes away, but I'd be far more worried about
what caused the original corruption to the index.

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Gerd Koenig (#1)
Re: [Pgpool-general] panic: index siblings mismatch

Have you turned off autovacuum on the first db node before starting
online recovery?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Show quoted text

Hello,

today I recognized an error while playing with pgpool-II 2.1 and
postgresql 8.3.5 (on 64bit linux).
I really don't know if the error is caused by pgpool or if it's
postgresql-related...?!?

There are up to 200 active connections in parallel and thereby sometimes
high load (with high write:read ratio, peak level ~60% write).
Can this be a problem in such a usage scenario, are there some
limitations in pgpool-II <-> number of connections / load ???

any hints are much appreciated, since we don't know what happened...

thanks in advance...::GERD::...

LOG DETAILS:

a) postgresql log of second db node where the error occured
b) pgpool log

a)
==================================================================
2009-03-10 11:16:33 CET [11008]: [1-1] PANIC: right sibling's left-link
doesn't match: block 695 links to 785 instead of expected 694 in index
"PartyAddress_index"
2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT: INSERT INTO
"PartyAddress" ("Name1", "Name2", "Street1", "Street2", "
City", "Postcode", "Region", "Country", "Timezone", "Qualifier",
"Compressed", "PositionId") VALUES (E'Testing', E'', E'
Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1',
E'address.loading', FALSE, NULL); select lastval()
2009-03-10 11:16:33 CET [29026]: [3-1] LOG: server process (PID 11008)
was terminated by signal 6: Aborted
2009-03-10 11:16:33 CET [29026]: [4-1] LOG: terminating any other
active server processes
2009-03-10 11:16:33 CET [10978]: [5-1] WARNING: terminating connection
because of crash of another server process
2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL: The postmaster has
commanded this server process to roll back the curren
t transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2009-03-10 11:16:33 CET [10978]: [7-1] HINT: In a moment you should be
able to reconnect to the database and repeat you
r command.
2009-03-10 11:16:33 CET [10947]: [1-1] WARNING: terminating connection
because of crash of another server process
.....
.....
2009-03-10 11:16:33 CET [29026]: [6-1] LOG: all server processes
terminated; reinitializing
2009-03-10 11:16:33 CET [11015]: [1-1] LOG: database system was
interrupted; last known up at 2009-03-10 11:09:10 CET
2009-03-10 11:16:33 CET [11015]: [2-1] LOG: starting archive recovery
2009-03-10 11:16:33 CET [11015]: [3-1] LOG: restore_command = 'cp
/exchange/wal_from_iweblin/%f %p'
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
2009-03-10 11:16:33 CET [11015]: [4-1] LOG: automatic recovery in progress
2009-03-10 11:16:33 CET [11015]: [5-1] LOG: redo starts at 1/AABEBFA8
2009-03-10 11:16:33 CET [11015]: [6-1] LOG: unexpected pageaddr
0/36BEE000 in log file 1, segment 170, offset 12509184
2009-03-10 11:16:33 CET [11015]: [7-1] LOG: redo done at 1/AABED938
2009-03-10 11:16:33 CET [11015]: [8-1] LOG: last completed transaction
was at log time 2009-03-10 11:15:17.533913+01
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [9-1] LOG: selected new timeline ID: 4
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [10-1] LOG: archive recovery complete
2009-03-10 11:16:33 CET [11015]: [11-1] LOG: checkpoint starting:
shutdown immediate
2009-03-10 11:16:34 CET [11015]: [12-1] LOG: checkpoint complete: wrote
4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.005 s, sync=0.086 s, total=0.242 s
2009-03-10 11:16:34 CET [11023]: [1-1] LOG: autovacuum launcher started
2009-03-10 11:16:34 CET [29026]: [7-1] LOG: database system is ready to
accept connections
=======================================

b)
==========================================
2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58)
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51)
2009-03-10 11:16:31 LOG: pid 27107: statement: SELECT d.oid, datname
as name, pg_get_userbyid(datdba) as owner, ds.des
cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate,
datlastsysoid, datconfig, s.spcname as tblspace, d
atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN
pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab
lespace s ON d.dattablespace = s.oid WHERE datname = 'test'

2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing
the query
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend T NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: T
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend D NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: D
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend C NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: C
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend Z NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: Z
2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0
length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I
2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance
mode
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58)
2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5
2009-03-10 11:16:36 LOG: pid 27107: connection received:
host=172.28.1.126 port=1946
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679
database: user:
2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0
database: test user: test
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1
because backend_status = 3
_______________________________________________
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general