can't start postgresql

Started by Igor Shevchenkoover 19 years ago13 messagesgeneral
Jump to latest
#1Igor Shevchenko
igor@carcass.ath.cx

Hi All,

I've got a problem w/ one of pgsql installations. It can't start:

[postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
PANIC: failed to re-find parent key in "23724"
Aborted

This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it
was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5
over it. Autovacuum is enabled.

How can I fix this ?

--
Best Regards,
Igor Shevchenko

#2Enrico
scotty@linuxtime.it
In reply to: Igor Shevchenko (#1)
Re: can't start postgresql

On Tue, 14 Nov 2006 02:23:39 +0200
Igor Shevchenko <igor@carcass.ath.cx> wrote:

Try with:

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data

Bye Enrico
--
If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already is !!!!
scotty@linuxtime.it - Skype:sscotty71
http://www.linuxtime.it/enricopirozzi

#3Igor Shevchenko
igor@carcass.ath.cx
In reply to: Enrico (#2)
Re: can't start postgresql

On Tuesday 14 November 2006 10:52, Enrico wrote:

Try with:

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data

Thanks, but this is not working -

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.

initdb: directory "/usr/pgsql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/usr/pgsql/data" or run initdb
with an argument other than "/usr/pgsql/data".

--
Best Regards,
Igor Shevchenko

#4Enrico
scotty@linuxtime.it
In reply to: Igor Shevchenko (#3)
Re: can't start postgresql

Thanks, but this is not working -

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.

initdb: directory "/usr/pgsql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/usr/pgsql/data" or run initdb
with an argument other than "/usr/pgsql/data".

of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
directory /usr/pgsql/data must be empty and user postgres must be the owner,
but I don't understand, what is your problem?

Enrico

--
If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already is !!!!
scotty@linuxtime.it - Skype:sscotty71
http://www.linuxtime.it/enricopirozzi

#5Noname
Matthias.Pitzl@izb.de
In reply to: Enrico (#4)
Re: can't start postgresql

He's just trying to do what you told him.
But the use of initdb won't correct the problem Igor got. There's something
wrong with his database so that he can't start the postmaster process:

PANIC: failed to re-find parent key in "23724"

Unfortunately i don't have any idea how to solve this issue :(

-- Matthias

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Enrico
Sent: Tuesday, November 14, 2006 4:22 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] can't start postgresql

Thanks, but this is not working -

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
The files belonging to this database system will be owned

by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale C.

initdb: directory "/usr/pgsql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/usr/pgsql/data" or run initdb
with an argument other than "/usr/pgsql/data".

of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
directory /usr/pgsql/data must be empty and user postgres
must be the owner,
but I don't understand, what is your problem?

Enrico

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Noname (#5)
Re: can't start postgresql

Matthias.Pitzl@izb.de wrote:

He's just trying to do what you told him.
But the use of initdb won't correct the problem Igor got. There's something
wrong with his database so that he can't start the postmaster process:

PANIC: failed to re-find parent key in "23724"

Unfortunately i don't have any idea how to solve this issue :(

I'd think starting a standalone backend and issuing a "reindex database"
should be enough to get him started. Now, the problem is figuring _how_
the index got in that state; or even _what_ index is the problematic
one. (I think it would be possible to find out by setting the
"log_error_verbosity" parameter to "verbose").

If it's a hardware problem, one would think it deserves some diagnosis.
It could be the btree bug Tom fixed last week, but I'm not sure if this
is really a consequence of it.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Igor Shevchenko (#1)
Re: can't start postgresql

On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote:

Hi All,

I've got a problem w/ one of pgsql installations. It can't start:

[postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
PANIC: failed to re-find parent key in "23724"
Aborted

This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it
was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5
over it. Autovacuum is enabled.

is this a fresh installation? or do you have an existing database here
that crashed or had some other problem?

merlin

#8Igor Shevchenko
igor@carcass.ath.cx
In reply to: Merlin Moncure (#7)
Re: can't start postgresql

Hi Merlin,

On Tuesday 14 November 2006 17:36, you wrote:

On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote:

I've got a problem w/ one of pgsql installations. It can't start:

[postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
PANIC: failed to re-find parent key in "23724"
Aborted

This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4.
Originally it was 8.1.2, and the bug appeared on v8.1.2; I've compiled
and installed v8.1.5 over it. Autovacuum is enabled.

is this a fresh installation? or do you have an existing database here
that crashed or had some other problem?

Yes it was a working installation, 32g of data, on v8.1.2. Pgsql wouldn't come
up, so I've upgraded it to v8.1.5 (from sources), and tried to start it in a
single-process mode, with no luck - it was still failing with the same,
abovementioned error message.

We had a backup and had already restored from it, but I still have the crashed
db saved elsewhere, for testing/debugging.

--
Best Regards,
Igor Shevchenko

#9Igor Shevchenko
igor@listmarketer.com
In reply to: Enrico (#4)
Re: can't start postgresql

On Tuesday 14 November 2006 17:21, Enrico wrote:

Thanks, but this is not working -

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
The files belonging to this database system will be owned by user
"postgres". This user must also own the server process.

The database cluster will be initialized with locale C.

initdb: directory "/usr/pgsql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/usr/pgsql/data" or run initdb
with an argument other than "/usr/pgsql/data".

of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
directory /usr/pgsql/data must be empty and user postgres must be the
owner, but I don't understand, what is your problem?

The problem is that the db was not empty (32g of data).

--
Best Regards,
Igor Shevchenko

#10Igor Shevchenko
igor@carcass.ath.cx
In reply to: Alvaro Herrera (#6)
Re: can't start postgresql

On Tuesday 14 November 2006 17:34, Alvaro Herrera wrote:

Matthias.Pitzl@izb.de wrote:

He's just trying to do what you told him.
But the use of initdb won't correct the problem Igor got. There's
something

wrong with his database so that he can't start the postmaster process:

PANIC: failed to re-find parent key in "23724"

Unfortunately i don't have any idea how to solve this issue :(

I'd think starting a standalone backend and issuing a "reindex database"
should be enough to get him started. Now, the problem is figuring _how_
the index got in that state; or even _what_ index is the problematic
one. (I think it would be possible to find out by setting the
"log_error_verbosity" parameter to "verbose").

The problem is, it won't start even in the standalone mode. It prints the same
error message and dies. I was trying to start it connected to my main db,
template0, template1 and postgres, no change. The index corruption (assuming
this IS an index corruption) must've happened with system table(s). Is there
any way to diagose that ? Maybe reindex a specific table without starting
postmaster, or remove certain index files physically (e.g. for pg_class,
etc) ?

If it's a hardware problem, one would think it deserves some diagnosis.
It could be the btree bug Tom fixed last week, but I'm not sure if this
is really a consequence of it.

Can't rule out an HW problem. I'd think this IS an HW problem, but we've
started using an updated version of our software on this server a week ago,
and it creates/drops alot of tables, temporary and not (kind of in-house
table partitioning), so this could be an (exceptionally rare) pgsql bug with
pg_class / pg_index / etc handling. A side question: is it safe to
create/drop tables from pl/pgsql functions, called directly or via a
trigger ?

Our previous version of the software was using a light version of what we're
trying to do now, and there was / maybe still a problem:

We're processing data in chunks, and each chunk is completely separate from
others, so using a special table per chunk seems to be a natural thing to do.
We keep these in a separate tablespace "w". There's one non-unique
(integer,varchar) index per each such table; both fields are non-null. Both
table and index are created via a pl/pgsql function. An index is created
after the data import, during the same transaction:

a) begin
b) select create_w_table(jobid,false) -- false means don't create index
c) insert into w.tableX select *....
d) select create_w_table(jobid,true) -- true means create index
e) commit

Both table and index are dropped at the same time, using another pl/pgsql
function.

create_w_table is rarely called from yet another pl/pgsql function.

Everything's working fine, except that sometimes an index can, um, "go off the
rails":

* not listed by "psql"'s \d w.tableX
* queries are not using it
* it's still listed in the pg_class (seen by "select * from pg_class where
relname='tblX_idx'")
* a corresponding "pg_index" row references a non-existing pg_class entry via
pg_index.indrelid

I'm seeing this several times a week, for over 5 months now. Here's my fix
procedure:

* fetch a list of probematic indexes:

select pc.relname,pc.oid,pc_idx.oid from pg_class pc left join pg_class pc_idx
on (pc_idx.relname=pc.relname || '_idx') where pc.relname ~ '^table\\d+$' and
(pc_idx.oid is NULL OR not exists (select 1 from pg_index where
pg_index.indexrelid=pc_idx.oid and indrelid=pc.oid))

* fix each with:

update pg_index set indrelid=${right one from pg_class} where
indexrelid=${index oid from pg_class};
update pg_class set relhasindex='t' where oid=${table oid}; -- AFAICR this is
redundant, as relhasindex is still true

now I can drop this index (usually it's out of sync), vacuum the table and
recreate index:

drop index w.tableX_idx;
vacuum full verbose w.tableX;
create index tableX_idx on w.tableX(...);
analyze w.tableX;

I've seen this on pgsql versions from v8.1.2 till v8.1.4, can't confirm with
v8.1.5 yet. Autovacuum's on; daily "vacuum verbose analyze" and "reindex" are
ran over pg_class, pg_index, a bunch of our own small tables, and recently
over pg_depend, pg_type and pg_statistic.

--
Best Regards,
Igor Shevchenko

#11John Gray
jgray@azuli.co.uk
In reply to: Noname (#5)
Re: can't start postgresql

On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote:

I'd think starting a standalone backend and issuing a "reindex database"
should be enough to get him started. Now, the problem is figuring _how_
the index got in that state; or even _what_ index is the problematic
one. (I think it would be possible to find out by setting the
"log_error_verbosity" parameter to "verbose").

The problem is, it won't start even in the standalone mode. It prints the same
error message and dies. I was trying to start it connected to my main db,
template0, template1 and postgres, no change. The index corruption (assuming
this IS an index corruption) must've happened with system table(s). Is there
any way to diagose that ? Maybe reindex a specific table without starting
postmaster, or remove certain index files physically (e.g. for pg_class,
etc) ?

I think you want to use -P to ignore the indexes on startup:

postgres -D /usr/local/pgsql/data -P my_database

Does that help - then you might get the chance to do a REINDEX?

Regards

John

#12Igor Shevchenko
igor@carcass.ath.cx
In reply to: John Gray (#11)
Re: can't start postgresql

On Wednesday 15 November 2006 01:41, you wrote:

On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote:

I'd think starting a standalone backend and issuing a "reindex database"
should be enough to get him started. Now, the problem is figuring _how_
the index got in that state; or even _what_ index is the problematic
one. (I think it would be possible to find out by setting the
"log_error_verbosity" parameter to "verbose").

The problem is, it won't start even in the standalone mode. It prints the
same error message and dies. I was trying to start it connected to my
main db, template0, template1 and postgres, no change. The index
corruption (assuming this IS an index corruption) must've happened with
system table(s). Is there any way to diagose that ? Maybe reindex a
specific table without starting postmaster, or remove certain index files
physically (e.g. for pg_class, etc) ?

I think you want to use -P to ignore the indexes on startup:

postgres -D /usr/local/pgsql/data -P my_database

Does that help - then you might get the chance to do a REINDEX?

Thanks for the pointer!

I thought it would help, but to my surprise, it didn't. Im still seeing this :

[postgres@db ~]$ /backup/pgsql/bin/postgres -D /backup/pgsql/data -P template1
PANIC: failed to re-find parent key in "23724"
Aborted

--
Best Regards,
Igor Shevchenko

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Igor Shevchenko (#1)
Re: can't start postgresql

Igor Shevchenko <igor@carcass.ath.cx> writes:

I've got a problem w/ one of pgsql installations. It can't start:

[postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
PANIC: failed to re-find parent key in "23724"
Aborted

Try applying this patch:
http://archives.postgresql.org/pgsql-committers/2006-11/msg00004.php

regards, tom lane