7.0.3 _bt_restscan: my bits moved right off the end of the world!

Started by Matthewalmost 25 years ago13 messages

matt@ctlno.com

almost 25 years ago

Postgre 7.0.3, on RedHat Linux 6.2 stock 2.2.16 kernel. Nothing special I
can think of, this server has been up and in use for the last 128 days with
no problem. Last night while cron was performing the nightly vacuuming of
all databases on one of our servers, I got this from cron.

Vacuuming cms
FATAL 1: _bt_restscan: my bits moved right off the end of the world!
Recreate index history_id_key.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
connection to server was lost

So this morning I did the following:

cms=# drop index history_id_key;
DROP
cms=# create unique index history_id_key on history(id);
CREATE
cms=# vacuum;
NOTICE: Index locks_operator_id_ndx: pointer to EmptyPage (blk 44 off 2) -
fixing
NOTICE: Index locks_operator_id_ndx: pointer to EmptyPage (blk 44 off 1) -
fixing
NOTICE: Index locks_operator_id_ndx: pointer to EmptyPage (blk 44 off 4) -
fixing
NOTICE: Index locks_operator_id_ndx: pointer to EmptyPage (blk 44 off 3) -
fixing
NOTICE: Index locks_operator_id_ndx: pointer to EmptyPage (blk 44 off 5) -
fixing
NOTICE: Index locks_id_key: pointer to EmptyPage (blk 44 off 2) - fixing
NOTICE: Index locks_id_key: pointer to EmptyPage (blk 44 off 1) - fixing
NOTICE: Index locks_id_key: pointer to EmptyPage (blk 44 off 4) - fixing
NOTICE: Index locks_id_key: pointer to EmptyPage (blk 44 off 3) - fixing
NOTICE: Index locks_id_key: pointer to EmptyPage (blk 44 off 5) - fixing
NOTICE: Index locks_case_id_ndx: pointer to EmptyPage (blk 44 off 5) -
fixing
NOTICE: Index locks_case_id_ndx: pointer to EmptyPage (blk 44 off 2) -
fixing
NOTICE: Index locks_case_id_ndx: pointer to EmptyPage (blk 44 off 1) -
fixing
NOTICE: Index locks_case_id_ndx: pointer to EmptyPage (blk 44 off 4) -
fixing
NOTICE: Index locks_case_id_ndx: pointer to EmptyPage (blk 44 off 3) -
fixing
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate
your database system connection and exit.
Please reconnect to the database system and repeat your query.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I have since stopped the database server and all my users are dead in the
water at the moment. I took postgres down to single user mode and I'm doing
a vacuum and was considering doing an iccpclean. Any other suggestions?
dump & restore? Any Idea what happened?

Thank you,

Matthew O'Connor

Matthew

matt@ctlno.com

almost 25 years ago

In reply to: Matthew (#1)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Postgre 7.0.3, on RedHat Linux 6.2 stock 2.2.16 kernel. Nothing special I
can think of, this server has been up and in use for the last 128 days
with
no problem. Last night while cron was performing the nightly vacuuming of
all databases on one of our servers, I got this from cron.

Vacuuming cms
FATAL 1: _bt_restscan: my bits moved right off the end of the world!
Recreate index history_id_key.
pqReadData() -- backend closed the channel unexpectedly.

[snip ]

I have since stopped the database server and all my users are dead in the
water at the moment. I took postgres down to single user mode and I'm
doing
a vacuum and was considering doing an iccpclean. Any other suggestions?
dump & restore? Any Idea what happened?

The vacuum I tried in single user mode failed (froze on a table, for over 20
minutes) so I killed it. I cleaned up shared memory (there were some things
left over). I started in single user mode and reindex database cms force,
also reindexed a few tables, then tried the vacuum again with the same
result. Do I need to dump resore? We should have a valid backup from the
night before.

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Matthew (#1)

Re: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Matthew <matt@ctlno.com> writes:

[ a tale of woe ]

It looks like dropping and rebuilding *all* the indexes on your history
table would be a good move (possibly with a vacuum of the table while
the indexes are removed). You might want to do a COPY out to try to
save the table data before the vacuum, in case there is corruption in
the table as well as the indexes.

Before you do all that, though, how big is the database? Would you be
able/willing to tar up the whole $PGDATA tree and let some of us analyze
it?

regards, tom lane

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 25 years ago

In reply to: Tom Lane (#3)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

I have since stopped the database server and all my users are
dead in the water at the moment. I took postgres down to single
user mode and I'm doing a vacuum and was considering doing an
iccpclean. Any other suggestions? dump & restore?
Any Idea what happened?

Drop indices; vacuum; create indices.

Vadim

Import Notes

Resolved by subject fallback

Matthew

matt@ctlno.com

almost 25 years ago

In reply to: Mikheev, Vadim (#4)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Matthew <matt@ctlno.com> writes:

[ a tale of woe ]

It looks like dropping and rebuilding *all* the indexes on your history
table would be a good move (possibly with a vacuum of the table while
the indexes are removed). You might want to do a COPY out to try to
save the table data before the vacuum, in case there is corruption in
the table as well as the indexes.

Before you do all that, though, how big is the database? Would you be
able/willing to tar up the whole $PGDATA tree and let some of us analyze
it?

regards, tom lane

I am going to tar up the $PGDATA directory so I have a backup of it
in case of bigger problems. I will send you a copy if you like but I have
already done some of the things you suggested but not all of them. The
database in question is (cms) 450 Meg, but we have a lot of databases in the
PGDATA directory, so the whole PGDATA directory totals to 3.1G. I don't
know if you want the whole thing or not. Let me know.

I dropped all the indexes on the history table did a vacuum then
recreated the index and vacuumed that table. That went fine, when I tried
to vacuum the entire database it hung on the cases table. I tried
reindexing that to no avail, and then tried dropping the index it appeared
to be hanging on then vacuuming and I got a different error, something about
memory being exhausted, which should not be the case, I don't have the exact
error in front of me any more :-( .

I'm still trying to get things back up and running. I believe I
have a successful pg_dump of the data in the cms database so I am going to
try to drop the cms database and restore from the dump. Unless you think
this is a bad idea.

Thank you very much for you help.

Import Notes

Resolved by subject fallback

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 25 years ago

In reply to: Matthew (#5)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

The vacuum I tried in single user mode failed (froze on a
table, for over 20 minutes) so I killed it.

Did you destroy indices before vacuum?

Vadim

Import Notes

Resolved by subject fallback

Matthew

matt@ctlno.com

almost 25 years ago

In reply to: Mikheev, Vadim (#6)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

The vacuum I tried in single user mode failed (froze on a
table, for over 20 minutes) so I killed it.

Did you destroy indices before vacuum?

Not all of them, it's a large database and I am trying to get it up
and running asap.

FYI now when I try to use psql to connect to the database I get this
error:

bash$ psql cms
psql: FATAL 1: cannot find attribute 1 of relation pg_trigger

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Matthew (#7)

Re: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Matthew <matt@ctlno.com> writes:

FYI now when I try to use psql to connect to the database I get this
error:
bash$ psql cms
psql: FATAL 1: cannot find attribute 1 of relation pg_trigger

So the indexes on pg_attribute are hosed too. I wonder whether that was
the original source of the problem, and the rest of this is
side-effects?

I am starting to think that you'd best initdb and reload, but there is
one more thing to try: run REINDEX on the whole database in standalone
mode. See the documentation for the procedure; I'm not too clear on it
since I've never had to do it myself.

regards, tom lane

Matthew

matt@ctlno.com

almost 25 years ago

In reply to: Tom Lane (#8)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Matthew <matt@ctlno.com> writes:

FYI now when I try to use psql to connect to the database I get this
error:
bash$ psql cms
psql: FATAL 1: cannot find attribute 1 of relation pg_trigger

So the indexes on pg_attribute are hosed too. I wonder whether that was
the original source of the problem, and the rest of this is
side-effects?

I am starting to think that you'd best initdb and reload, but there is
one more thing to try: run REINDEX on the whole database in standalone
mode. See the documentation for the procedure; I'm not too clear on it
since I've never had to do it myself.

regards, tom lane

What do you mean by the whole database? I have already executed:

reindex database cms force
reindex table cases force
reindex table cases force
reindex table hits force
reindex table history force (and a few more)

How do I get it do reindex the system tables? One table at a time?

Import Notes

Resolved by subject fallback

#10

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 25 years ago

In reply to: Matthew (#9)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

The vacuum I tried in single user mode failed (froze on a
table, for over 20 minutes) so I killed it.

Did you destroy indices before vacuum?

Not all of them, it's a large database and I am trying
to get it up and running asap.

Did you destroy *all* indices of table vacuum hung on?

Vadim

Import Notes

Resolved by subject fallback

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Matthew (#9)

Re: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

Matthew <matt@ctlno.com> writes:

What do you mean by the whole database? I have already executed:

reindex database cms force

(checks manual...) That appears to be the right syntax. If you did
that in a standalone backend with the appropriate command line options
(-O and -P) then I think you've done all you can. Time for a reload :-(

regards, tom lane

#12

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 25 years ago

In reply to: Tom Lane (#11)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

What do you mean by the whole database? I have already
executed:

reindex database cms force
reindex table cases force
reindex table cases force
reindex table hits force
reindex table history force (and a few more)

How do I get it do reindex the system tables? One
table at a time?

"reindex database cms force" is supposed to reindex all system
tables (and *system ones only*), but from my recollection it
didn't help sometime, so it's better reindex pg_attributes
& pg_class with separate command. Maybe after vacuuming them.

Vadim

Import Notes

Resolved by subject fallback

#13

Hiroshi Inoue

Inoue@tpf.co.jp

almost 25 years ago

In reply to: Matthew (#7)

RE: 7.0.3 _bt_restscan: my bits moved right off the end of the world!

-----Original Message-----
From: Matthew

The vacuum I tried in single user mode failed (froze on a
table, for over 20 minutes) so I killed it.

Did you destroy indices before vacuum?

Not all of them, it's a large database and I am trying to get it up
and running asap.

FYI now when I try to use psql to connect to the database I get this
error:

bash$ psql cms
psql: FATAL 1: cannot find attribute 1 of relation pg_trigger

Please try the following query under standalone postgres(with -P and -O
options).

select * from pg_attribute where attrelid=1219 and attnum=1;

If you would get no result, your pg_attribute is corrupted unfortunately.

regards,
Hiroshi Inoue