Re: btree: BTP_CHAIN flag was expected (revisited)

Started by The Hermit Hackerover 27 years ago3 messages
#1The Hermit Hacker
scrappy@hub.org

On Mon, 22 Jun 1998, David Schanen wrote:

Hi Marc & Mike,

I wanted to check with you to see if you had seen my latest post to
pgsql-questions?

pgsql-questions has been a discontinued mailing list for over a
month now, actually...and, from the topic, this should be discussed on
pgsql-hackers anyway :)

Looking at the backtrace from the debug.core It seems to me like we are
still getting the BTP_CHAIN errors we saw in previous versions.

You are using v6.3.2+patches currently?

Show quoted text

The cause seems to be a corruption in a single record of a btree index
in very large tables(indices). If we simply restart the postgres
backends and try to query on the same record where it crashed before we
cause the crash again. If we query on any other record there is no
problem. If we reindex the problem goes away. Unfortunately this is a
high volume realtime telephony application and taking the system out of
service for twenty minutes to reindex could cause the loss of too much
data for thousands of calls and prevention of service to thousands more.

I think the bug must be in writing of the index record or (more likely)
an adjacent index record but I don't know how how to find it.

Marc I have been reluctant to include Vadim in these emails so far. Do
you think it is okay to bring him in on this one? I haven't had any
response from the post to the list.

Mike have you tried compiling with debug?

Below is the backtrace output from my debug.core. I see the BTP_CHAIN
error, am I missing something else that you can see?

Thanks for your help!

Best regards,

-Dave

PS. Marc, I haven't heard back from Mike since my earliest email. Have you heard
anything fom him?

David Schanen wrote:

Maybe we are still having btree problems, but we no longer see the BTP_CHAIN
error. Now we get:

IpcMemoryCreate: memKey=5432101 , size=2361552 ,
permission=384IpcMemoryCreate: shmget(..., create, ...) failed:
Cannot allocate memory

Here is the backtrace output. Let me know if you need the core file.

Thanks,

-Dave
----------------
Postgres 6.3.2
Pentium II / 200 - 128M
FreeBSD 3.0-981007-SNAP

# gdb postgres postgres.core.save
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd),
Copyright 1996 Free Software Foundation, Inc...
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x40ff080.
#0 0x41a256a in ?? ()
(gdb) bt
#0 0x41a256a in ?? ()
#1 0x41b7060 in ?? ()
#2 0x415b5e5 in ?? ()
#3 0xc7578 in elog (lev=1,
fmt=0x1381e "btree: BTP_CHAIN flag was expected in %s (access = %s)")
at elog.c:121
#4 0x1397f in _bt_moveright (rel=0x225290, buf=153, keysz=1,
scankey=0x21b3d0, access=0) at nbtsearch.c:222
#5 0x137e9 in _bt_searchr (rel=0x225290, keysz=1, scankey=0x21b3d0,
bufP=0xefbfb664, stack_in=0x2405f0) at nbtsearch.c:127
#6 0x136e7 in _bt_search (rel=0x225290, keysz=1, scankey=0x21b3d0,
bufP=0xefbfb664) at nbtsearch.c:55
#7 0x1014e in _bt_doinsert (rel=0x225290, btitem=0x21b390, index_is_unique=0,
heapRel=0x21dd90) at nbtinsert.c:63
#8 0x12f84 in btinsert (rel=0x225290, datum=0x2405b0, nulls=0x2405d0 " \002",
ht_ctid=0x1dd228, heapRel=0x21dd90) at nbtree.c:377
#9 0xc8445 in fmgr_c (finfo=0xefbfb6f4, values=0xefbfb704,
isNull=0xefbfb6f3 "") at fmgr.c:119
#10 0xc8834 in fmgr (procedureId=331) at fmgr.c:290
#11 0xc6d5 in index_insert (relation=0x225290, datum=0x2405b0,
nulls=0x2405d0 " \002", heap_t_ctid=0x1dd228, heapRel=0x21dd90)
at indexam.c:180
#12 0x3a178 in ExecInsertIndexTuples (slot=0x1cbc10, tupleid=0x1dd228,
estate=0x1d8310, is_update=0) at execUtils.c:1156
#13 0x36fa9 in ExecAppend (slot=0x1cbc10, tupleid=0x0, estate=0x1d8310)
at execMain.c:1010
#14 0x36dfe in ExecutePlan (estate=0x1d8310, plan=0x1d8210,
parseTree=0x225910, operation=CMD_INSERT, numberTuples=0,
direction=ForwardScanDirection, printfunc=0x3520 <printtup>)
at execMain.c:814
#15 0x36751 in ExecutorRun (queryDesc=0x230f50, estate=0x1d8310, feature=3,
count=0) at execMain.c:236
#16 0xa01db in ProcessQueryDesc (queryDesc=0x230f50) at pquery.c:332
#17 0xa0246 in ProcessQuery (parsetree=0x225910, plan=0x1d8210, argv=0x0,
typev=0x0, nargs=0, dest=Remote) at pquery.c:378
#18 0x9e3dd in pg_exec_query_dest (
query_string=0xefbfb934 "insert into acct_history (acct_no, activity_date,
origination, destination, duration, amount, balance, changed_by, changed_on)
VALUES ( '126587291393', 'Wed Jun 17 18:38:06 1998', '0213906996', '79028"...,
argv=0x0, typev=0x0, nargs=0, dest=Remote) at postgres.c:699
#19 0x9e290 in pg_exec_query (
query_string=0xefbfb934 "insert into acct_history (acct_no, activity_date,
origination, destination, duration, amount, balance, changed_by, changed_on)
VALUES ( '126587291393', 'Wed Jun 17 18:38:06 1998', '0213906996', '79028"...,
argv=0x0, typev=0x0, nargs=0) at postgres.c:601
#20 0x9fa31 in PostgresMain (argc=9, argv=0xefbfd978) at postgres.c:1382
#21 0x49bfa in main (argc=9, argv=0xefbfd978) at main.c:106
(gdb)

The Hermit Hacker wrote:

On Mon, 8 Jun 1998, David Schanen wrote:

a) I compiled 6.3.2 with CASSERT as recommended by vadim in one of
his posts. What does this do for me exactly? Could this be the reason
we aren't seeing the error report any longer?

CASSERT shouldn't be used in production, only in development...can
you send in a trace of what the core shows?

b) Can someone explain what causes the BTP_CHAIN error above?

all I know is that its an index corruption only fixed by dropping
and recreating the index. v6.3.2 tells you which table is generating the
BTP_CHAIN error as part of its error message...

b) How dangerous do you think it is to continue to run the database
in this condition?

My experience: the index is useless when the condition is
triggered...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#2David Schanen
mtv@ibm.net
In reply to: The Hermit Hacker (#1)

The Hermit Hacker wrote:

On Mon, 22 Jun 1998, David Schanen wrote:

Hi Marc & Mike,

Looking at the backtrace from the debug.core It seems to me like we are
still getting the BTP_CHAIN errors we saw in previous versions.

You are using v6.3.2+patches currently?

Actually, we are using the basic 6.3.2, no patches applied yet. I'll look into that.
Is this a known bug fix in the current patch release?

-Dave

#3The Hermit Hacker
scrappy@hub.org
In reply to: David Schanen (#2)

On Tue, 23 Jun 1998, David Schanen wrote:

The Hermit Hacker wrote:

On Mon, 22 Jun 1998, David Schanen wrote:

Hi Marc & Mike,

Looking at the backtrace from the debug.core It seems to me like we are
still getting the BTP_CHAIN errors we saw in previous versions.

You are using v6.3.2+patches currently?

Actually, we are using the basic 6.3.2, no patches applied yet. I'll look into that.
Is this a known bug fix in the current patch release?

I haven't seen it since upgrading my server(s) to v6.3.2+patches,
where I saw it relatively often before hand...but I won't guarantee that
that just hasn't been luck either...