invalid OID warning after disk failure

Started by Gabriel Sánchez Martínezover 11 years ago5 messagesgeneral
Jump to latest
#1Gabriel Sánchez Martínez
gabrielesanchez@gmail.com

I'm running PostgreSQL 9.3.5 on Ubuntu 14.04 on x86_64. The database
directory is on linux mdadm RAID10, using 4 4TB disks and a far=2
layout. While the RAID tolerates 1 drive failure nicely, I had the
misfortune of 2 drives failing consecutively, one of which had many
sectors reallocated and began failing SMART criteria. That one is out
now. As a result of this some files were corrupted.

I was getting the following errors on some tables:

ERROR: could not read block 0 in file "base/27810/3995569":
Input/output error

but after dropping those tables the errors are gone.

The situation appears to be stable now, but upon running REINDEX and
VACUUM on one of the databases, I get the following:

WARNING: relation "pg_attrdef" TID 1/1: OID is invalid
WARNING: relation "pg_attrdef" TID 1/2: OID is invalid
WARNING: relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup? My most recent
backup is from late September, so I would lose some data. I also backed
up what I could as soon as the disks started giving errors, but I don't
know if I can trust that.

Should I drop the entire cluster?

Regarding hardware, I'm going to add hot standby drives to prevent this
from happening in the future.

Thanks in advance for your advice.

Regards,
Gabriel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2raghu ram
raghuchennuru@gmail.com
In reply to: Gabriel Sánchez Martínez (#1)
Re: invalid OID warning after disk failure

On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <
gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and
VACUUM on one of the databases, I get the following:

WARNING: relation "pg_attrdef" TID 1/1: OID is invalid
WARNING: relation "pg_attrdef" TID 1/2: OID is invalid
WARNING: relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup? My most recent
backup is from late September, so I would lose some data. I also backed up
what I could as soon as the disks started giving errors, but I don't know
if I can trust that.

Should I drop the entire cluster?

are you receiving any kind of error messages,while taking database dump
with "pg_dump" utility.

If you are not receiving any kind of error message,try to take a database
dump and restore database dump file in another database server and perform
all sanity checks. If all sanity checks are working fine,you can create a
new cluster and restore all database dump's.

Thanks & Regards
Raghu Ram

#3Gabriel Sánchez Martínez
gabrielesanchez@gmail.com
In reply to: raghu ram (#2)
Re: invalid OID warning after disk failure

On Nov 17, 2014, at 3:28, Raghu Ram <raghuchennuru@gmail.com> wrote:

On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and VACUUM on one of the databases, I get the following:

WARNING: relation "pg_attrdef" TID 1/1: OID is invalid
WARNING: relation "pg_attrdef" TID 1/2: OID is invalid
WARNING: relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup? My most recent backup is from late September, so I would lose some data. I also backed up what I could as soon as the disks started giving errors, but I don't know if I can trust that.

Should I drop the entire cluster?

are you receiving any kind of error messages,while taking database dump with "pg_dump" utility.

I am getting these when running vacuum on the live database. Earlier I ran pg_dump. It was not working for the whole database due to I/O errors, so i did a series of more selective dumps, schema by schema and excluding the offending tables, so that I didn't get any errors or warnings. In the live database I dropped the few tables that were responsible for the pg_dump errors.

Show quoted text

If you are not receiving any kind of error message,try to take a database dump and restore database dump file in another database server and perform all sanity checks. If all sanity checks are working fine,you can create a new cluster and restore all database dump's.

Thanks & Regards
Raghu Ram

#4Gabriel Sánchez Martínez
gabrielesanchez@gmail.com
In reply to: Gabriel Sánchez Martínez (#3)
Re: invalid OID warning after disk failure

On Nov 17, 2014, at 8:10, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

On Nov 17, 2014, at 3:28, Raghu Ram <raghuchennuru@gmail.com> wrote:

On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and VACUUM on one of the databases, I get the following:

WARNING: relation "pg_attrdef" TID 1/1: OID is invalid
WARNING: relation "pg_attrdef" TID 1/2: OID is invalid
WARNING: relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup? My most recent backup is from late September, so I would lose some data. I also backed up what I could as soon as the disks started giving errors, but I don't know if I can trust that.

Should I drop the entire cluster?

are you receiving any kind of error messages,while taking database dump with "pg_dump" utility.

I am getting these when running vacuum on the live database. Earlier I ran pg_dump. It was not working for the whole database due to I/O errors, so i did a series of more selective dumps, schema by schema and excluding the offending tables, so that I didn't get any errors or warnings. In the live database I dropped the few tables that were responsible for the pg_dump errors.

If you are not receiving any kind of error message,try to take a database dump and restore database dump file in another database server and perform all sanity checks. If all sanity checks are working fine,you can create a new cluster and restore all database dump's.

Any suggestions on what sanity checks I should run? For now I am doing reindex and vacuum. I think that forces everything to be read. I am assuming that if vacuum completes without error then the table is fine.

Show quoted text

Thanks & Regards
Raghu Ram

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gabriel Sánchez Martínez (#4)
Re: invalid OID warning after disk failure

Gabriel S�nchez Mart�nez wrote:

Any suggestions on what sanity checks I should run? For now I am
doing reindex and vacuum. I think that forces everything to be read. I
am assuming that if vacuum completes without error then the table is
fine.

The main sanity check is to make sure the disks are now in the recycling
center and you have rebuilt the array with new ones. Then restore the
whole database in a freshly initdb'd system, and praise yourself that
you were able to get them before it all blew to pieces.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general