database corruption

Started by Jeff Brentonabout 17 years ago10 messagesgeneral

jbrenton@sandvine.com

about 17 years ago

I've encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut down
cleanly. Postgres started appropriately but crashed 45 minutes later.
I used pg_resetxlog after the crash to get the db to start again but it
appears that the database is not running properly now. When users try
to access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

Chris Spotts

rfusca@gmail.com

about 17 years ago

In reply to: Jeff Brenton (#1)

Re: database corruption

I would imagine you would have better luck dropping the index and
recreating. But considering you're 98% full on that drive, it looks like
you're about to have other problems...

On Wed, Apr 8, 2009 at 8:32 PM, Jeff Brenton <jbrenton@sandvine.com> wrote:

I’ve encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut down
cleanly. Postgres started appropriately but crashed 45 minutes later. I
used pg_resetxlog after the crash to get the db to start again but it
appears that the database is not running properly now. When users try to
access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}

I’ve attempted to re-index the pkey listed but after an hour it fails with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left on
device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Chris Spotts
rfusca@gmail.com

Adrian Klaver

adrian.klaver@aklaver.com

about 17 years ago

In reply to: Jeff Brenton (#1)

Re: database corruption

On Wednesday 08 April 2009 6:32:06 pm Jeff Brenton wrote:

I've encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut down
cleanly. Postgres started appropriately but crashed 45 minutes later.
I used pg_resetxlog after the crash to get the db to start again but it
appears that the database is not running properly now. When users try
to access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

I guess the first question is, does the db have permissions(access) to all that
space?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Adrian Klaver
aklaver@comcast.net

Jeff Brenton

jbrenton@sandvine.com

about 17 years ago

In reply to: Adrian Klaver (#3)

Re: database corruption

There are no filesystem level content size restrictions that I am aware
of on this system. The user pgsql should have full access to the
filesystems indicated except for the root filesystem.

Where is the temporary location? I am searching around to see if I can
specify it anywhere in the config files but can't seem to find anything
which leads me to believe that its part of the postgres data directory.

-----Original Message-----
From: Adrian Klaver [mailto:aklaver@comcast.net]
Sent: Wednesday, April 08, 2009 10:10 PM
To: pgsql-general@postgresql.org
Cc: Jeff Brenton
Subject: Re: [GENERAL] database corruption

On Wednesday 08 April 2009 6:32:06 pm Jeff Brenton wrote:

I've encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut

down

cleanly. Postgres started appropriately but crashed 45 minutes later.
I used pg_resetxlog after the crash to get the db to start again but

appears that the database is not running properly now. When users try
to access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed

on.

Does anyone know what I can do to get the db up and running again?

I guess the first question is, does the db have permissions(access) to
all that
space?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Adrian Klaver
aklaver@comcast.net

Jeff Brenton

jbrenton@sandvine.com

about 17 years ago

In reply to: Chris Spotts (#2)

Re: database corruption

I was looking at dropping the index and recreating. Part of the reason
that I restarted postgres was to enable WAL archiving so that I can
migrate to a larger filesystem next week. I've got a system with a 1.3T
array that I will be migrating to. This DB has been neglected and I am
trying to fix it but things are arguing with me.

Will dropping the index have any negative consequences if the indexed
table has a primary key associated with it? I think not but want to be
certain.

________________________________

From: Chris [mailto:rfusca@gmail.com]
Sent: Wednesday, April 08, 2009 10:08 PM
To: Jeff Brenton
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] database corruption

I would imagine you would have better luck dropping the index and
recreating. But considering you're 98% full on that drive, it looks
like you're about to have other problems...

On Wed, Apr 8, 2009 at 8:32 PM, Jeff Brenton <jbrenton@sandvine.com>
wrote:

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at block
3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Chris Spotts
rfusca@gmail.com

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Jeff Brenton (#4)

Re: database corruption

On Wed, 2009-04-08 at 22:14 -0400, Jeff Brenton wrote:

There are no filesystem level content size restrictions that I am aware
of on this system. The user pgsql should have full access to the
filesystems indicated except for the root filesystem.

Inodes?

Where is the temporary location? I am searching around to see if I can
specify it anywhere in the config files but can't seem to find anything
which leads me to believe that its part of the postgres data directory.

-----Original Message-----
From: Adrian Klaver [mailto:aklaver@comcast.net]
Sent: Wednesday, April 08, 2009 10:10 PM
To: pgsql-general@postgresql.org
Cc: Jeff Brenton
Subject: Re: [GENERAL] database corruption

On Wednesday 08 April 2009 6:32:06 pm Jeff Brenton wrote:

I've encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut

down

cleanly. Postgres started appropriately but crashed 45 minutes later.
I used pg_resetxlog after the crash to get the db to start again but

it

appears that the database is not running properly now. When users try
to access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no diagnostic
record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed

on.

Does anyone know what I can do to get the db up and running again?

I guess the first question is, does the db have permissions(access) to
all that
space?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Adrian Klaver
aklaver@comcast.net

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

Jeff Brenton

jbrenton@sandvine.com

about 17 years ago

In reply to: Joshua D. Drake (#6)

Re: database corruption

There are 9 miilion inodes free on /db. All other partitions have at
least 1/2 million free.

-----Original Message-----
From: Joshua D. Drake [mailto:jd@commandprompt.com]
Sent: Wednesday, April 08, 2009 10:26 PM
To: Jeff Brenton
Cc: Adrian Klaver; pgsql-general@postgresql.org
Subject: Re: [GENERAL] database corruption

On Wed, 2009-04-08 at 22:14 -0400, Jeff Brenton wrote:

There are no filesystem level content size restrictions that I am

aware

of on this system. The user pgsql should have full access to the
filesystems indicated except for the root filesystem.

Inodes?

Where is the temporary location? I am searching around to see if I

can

specify it anywhere in the config files but can't seem to find

anything

which leads me to believe that its part of the postgres data

directory.

-----Original Message-----
From: Adrian Klaver [mailto:aklaver@comcast.net]
Sent: Wednesday, April 08, 2009 10:10 PM
To: pgsql-general@postgresql.org
Cc: Jeff Brenton
Subject: Re: [GENERAL] database corruption

On Wednesday 08 April 2009 6:32:06 pm Jeff Brenton wrote:

I've encountered some db corruption after restarting postgres on my
database server running 8.2.4. I think that postgres did not shut

down

cleanly. Postgres started appropriately but crashed 45 minutes

later.

I used pg_resetxlog after the crash to get the db to start again but

it

appears that the database is not running properly now. When users

try

to access some of the tables in the db they get the error below;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}> <SQL environment diagnostic: no

diagnostic

record

SQL connection is null

SQL statement diagnostic: XX002 7 {Error while executing the query;

ERROR: index "testrun_log_pkey" contains unexpected zero page at

block

3155408

HINT: Please REINDEX it.}

I've attempted to re-index the pkey listed but after an hour it

fails

with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space

left

on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed

on.

Does anyone know what I can do to get the db up and running again?

I guess the first question is, does the db have permissions(access) to
all that
space?

/dev/amrd0s1a 3.9G 2.7G 898M 75% /

/dev/amrd0s1e 115G 43G 63G 40% /backup

/dev/amrd1s1d 133G 748M 121G 1% /wal

/dev/amrd2s1d 663G 596G 14G 98% /db

/dev/amrd0s1d 3.9G 184M 3.4G 5% /var

--
Adrian Klaver
aklaver@comcast.net

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

Craig Ringer

craig@2ndquadrant.com

about 17 years ago

In reply to: Jeff Brenton (#1)

Re: database corruption

Jeff Brenton wrote:

I've attempted to re-index the pkey listed but after an hour it fails
with

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device

HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

Is there 14GB free at the time PostgreSQL claims to run out of space? Try:

watch -n 60 "df -m /db"

and see if the free space falls close to zero during your reindex attempt.

Personally, I'd just try to give Pg some room to breathe.

--
Craig Ringer

Laurenz Albe

laurenz.albe@cybertec.at

about 17 years ago

In reply to: Jeff Brenton (#7)

Re: database corruption

This thread is a top posting mess. I'll try to rearrange:

Jeff Brenton wrote:

REINDEX INDEX testrun_log_pkey;

ERROR: could not write block 1832079 of temporary file: No space left
on device
HINT: Perhaps out of disk space?

There is currently 14GB free on the disk that postgres is installed on.
Does anyone know what I can do to get the db up and running again?

[...]

/dev/amrd2s1d 663G 596G 14G 98% /db

I guess the first question is, does the db have permissions(access) to
all that space?

There are no filesystem level content size restrictions that I am aware
of on this system. The user pgsql should have full access to the
filesystems indicated except for the root filesystem.

Inodes?

There are 9 miilion inodes free on /db. All other partitions have at
least 1/2 million free.

Assuming that this is ext3 on Linux, it could be space reserved for root.

What do you get if you run the following as root:

dumpe2fs /dev/amrd2s1d | grep 'Reserved block count'

Yours,
Laurenz Albe

#10

Ivan Sergio Borgonovo

mail@webthatworks.it

about 17 years ago

In reply to: Jeff Brenton (#4)

Re: database corruption

On Wed, 8 Apr 2009 22:14:38 -0400
"Jeff Brenton" <jbrenton@sandvine.com> wrote:

There are no filesystem level content size restrictions that I am
aware of on this system. The user pgsql should have full access
to the filesystems indicated except for the root filesystem.

finished inodes?
A lot of small files (even empty) somewhere?
It happened to me when I was running a spider using curl and the
spiders where not exiting properly when another process was killing
them...

--
Ivan Sergio Borgonovo
http://www.webthatworks.it