FATAL: catalog is missing 1 attribute(s) for relid 16396

Started by The Hermit Hackerover 21 years ago26 messageshackers
Jump to latest
#1The Hermit Hacker
scrappy@hub.org

Did some searching, and figured/concluded that things are pretty much
hosed ... the last backup is from June (clients machine, not ours), but
before I suggest going back to that backup, I want to make sure that I
haven't overlooked anything ...

What the client did was a 'delete from pg_attribute where ... ' ...

The database is a 7.4.2 one ... my first thought was one of the older
standbys ... rebuild the schema and move the data files into place over
top of that ... but of course, 7.3 and beyond are OID based vs name based
files, so that doesn't work, unless there is some way of figuring out
which file in the old directory corresponds to while oid-file, and without
beign able to get into the database to read the system files, thats a wee
bit difficult ...

Next, figured to try and get in via STAND-ALONE, since I've previously
dump'd data that way, to reload it ... course, that doesn't work, since I
still need to be able to read the system tables, which is where this
problem stems from ...

I'm still searching the 'net to see if there is somethign that I've
overlooked ... but everything so far is drawing a deadend ... can someone
suggest a web page I should read, a tool I could use, or something, to get
the data out of this, that I'm not finding? Or some way of 'fixing' relid
16396? :)

Thanks ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#2Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: The Hermit Hacker (#1)
Re: FATAL: catalog is missing 1 attribute(s) for relid

Marc G. Fournier wrote:

What the client did was a 'delete from pg_attribute where ... ' ...

The database is a 7.4.2 one ... my first thought was one of the older
standbys ... rebuild the schema and move the data files into place over
top of that ... but of course, 7.3 and beyond are OID based vs name
based files, so that doesn't work, unless there is some way of figuring
out which file in the old directory corresponds to while oid-file, and
without beign able to get into the database to read the system files,
thats a wee bit difficult ...

This is probably worth a shot, as I think the catalog oid's are always
the same (can't find the right place in the code to check....), but oid
16396 is pg_am for all the systems here:

# select relname,oid,relfilenode from pg_class where oid like '16396';
relname | oid | relfilenode
---------+-------+-------------
pg_am | 16396 | 16396

However, I think it is pg_attribute that you want to rescue - as the
system cannot lookup the attributes for pg_am due to the pg_attribute
deletion:

# select relname,oid from pg_class where relname like 'pg_attribute';
relname | oid
--------------+------
pg_attribute | 1249

You could probably copy 1249 from one of your standbys to your broken
system's PGDATA...(better backup the broken system first, or try the
rescue on another box).

good luck

Mark

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#1)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

What the client did was a 'delete from pg_attribute where ... ' ...

[ blink... ] Well, that sort of thing is definitely a candidate for the
Darwin Award, but what exactly was the WHERE clause?

The database is a 7.4.2 one ... my first thought was one of the older
standbys ... rebuild the schema and move the data files into place over
top of that ... but of course, 7.3 and beyond are OID based vs name based
files, so that doesn't work, unless there is some way of figuring out
which file in the old directory corresponds to while oid-file, and without
beign able to get into the database to read the system files, thats a wee
bit difficult ...

I think that people have once or twice gotten themselves out of that
but it ain't easy. Give us full details.

I'm still searching the 'net to see if there is somethign that I've
overlooked ... but everything so far is drawing a deadend ... can someone
suggest a web page I should read, a tool I could use, or something, to get
the data out of this, that I'm not finding? Or some way of 'fixing' relid
16396? :)

[ select 16396::regclass... ] pg_am? You may be in luck, because that
is the one solitary system catalog that no one ever changes. If that's
all that got hit you might have a chance. What I'm wondering is just
exactly what the extent of the damage was.

regards, tom lane

#4The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#3)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Thu, 13 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

What the client did was a 'delete from pg_attribute where ... ' ...

[ blink... ] Well, that sort of thing is definitely a candidate for the
Darwin Award, but what exactly was the WHERE clause?

We were working on removing a 'botched' erserver install, to put slony
into place, and he didn't realize the ramifications of modifying system
files directly :(

I'm still searching the 'net to see if there is somethign that I've
overlooked ... but everything so far is drawing a deadend ... can someone
suggest a web page I should read, a tool I could use, or something, to get
the data out of this, that I'm not finding? Or some way of 'fixing' relid
16396? :)

[ select 16396::regclass... ] pg_am? You may be in luck, because that
is the one solitary system catalog that no one ever changes. If that's
all that got hit you might have a chance. What I'm wondering is just
exactly what the extent of the damage was.

'k, I can try that in the morning and see ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#5Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#2)
Re: FATAL: catalog is missing 1 attribute(s) for relid

Marc,

In case you don't work out a better way to sort this, I can reproduce
and fix the error 'catalog is missing n attribute(s) for relid 16396'
caused by directly deleting (part of) pg_attribute:

Setup :

$ initdb
$ pg_ctl start
$ createdb test

Backup :

$ pg_ctl stop
$ tar -czvf pgdata.tar.gz pgdata

Break :

$ pg_ctl start
$ psql -d test -c "delete from pg_attribute where attrelid=16396"
$ psql test [gets FATAL catalog is missing 20 attribute(s) for relid 16396]

Fix (restore pg_attribute to database test):

$ pg_ctl stop
$ tar -zxvf pgdata.tar.gz pgdata/base/17142/1249
$ pg_ctl start
$ psql test [now works]

The caveat is that any relations created or modified between the backup
and breakage will not be properly restored.

regards

Mark

#6The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#3)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Thu, 13 Jan 2005, Tom Lane wrote:

I'm still searching the 'net to see if there is somethign that I've
overlooked ... but everything so far is drawing a deadend ... can someone
suggest a web page I should read, a tool I could use, or something, to get
the data out of this, that I'm not finding? Or some way of 'fixing' relid
16396? :)

[ select 16396::regclass... ] pg_am? You may be in luck, because that
is the one solitary system catalog that no one ever changes. If that's
all that got hit you might have a chance. What I'm wondering is just
exactly what the extent of the damage was.

Nope, that doesn't work ... in fact, the 16396/pg_am table hadn't even
been modified since Jun of last year ;(

If I rebuild pg_attribute based on the schema itself, and copy that into
place, should that work? Guess at this point in time, it can't hurt to
try :)

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#7The Hermit Hacker
scrappy@hub.org
In reply to: The Hermit Hacker (#6)
Re: FATAL: catalog is missing 1 attribute(s) for relid

course that won't work, since its link'd to the oid of the table name :(
whose idea was this "let's name the files by the OID" again? :(

On Fri, 14 Jan 2005, Marc G. Fournier wrote:

On Thu, 13 Jan 2005, Tom Lane wrote:

I'm still searching the 'net to see if there is somethign that I've
overlooked ... but everything so far is drawing a deadend ... can someone
suggest a web page I should read, a tool I could use, or something, to get
the data out of this, that I'm not finding? Or some way of 'fixing' relid
16396? :)

[ select 16396::regclass... ] pg_am? You may be in luck, because that
is the one solitary system catalog that no one ever changes. If that's
all that got hit you might have a chance. What I'm wondering is just
exactly what the extent of the damage was.

Nope, that doesn't work ... in fact, the 16396/pg_am table hadn't even been
modified since Jun of last year ;(

If I rebuild pg_attribute based on the schema itself, and copy that into
place, should that work? Guess at this point in time, it can't hurt to try
:)

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#7)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

course that won't work, since its link'd to the oid of the table name :(

Not to mention all the other system catalogs. You could maybe make this
idea work by regenerating the entire catalog set, but not by
regenerating just pg_attribute. But if you don't know the table-to-OID
mapping in the old database you're screwed anyway :-(

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#7)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

course that won't work, since its link'd to the oid of the table name :(
whose idea was this "let's name the files by the OID" again? :(

Actually, I think you can make this work, if you are sure of the schema
of the old database. Try something like this:

* Continue to work in the same installation; don't initdb. If you did
initdb then old transaction numbers would be wrong. Just create a new
database beside the old one (or maybe better, physically copy the old
one someplace and then drop and re-createdb it).

* Rebuild the schema. Now you have a lot of empty tables and you just
have to get the old data into them. That means you have to find out the
mapping from old table filenode numbers to new ones.

* To find out the old numbers, make a user table that has the identical
schema to pg_class (probably easiest to do this with the LIKE clause of
CREATE TABLE). Check its relfilenode number in pg_class, then copy the
old database's pg_class file over that relfilenode. Now you can query
this table to see the contents of the old pg_class.

* Join the new and old pg_class together to get corresponding
relfilenode numbers.

* Copy old table files into new database per the above. (I'd make a
script to do this instead of doing it by hand...) Also you'll need to
copy corresponding TOAST tables. Don't copy indexes though.

* REINDEX all the indexes, and I think you're there.

It may take a couple tries to get this right, but as long as you made a
copy of the old database to start with, you can start over...

regards, tom lane

#10The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#8)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Fri, 14 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

course that won't work, since its link'd to the oid of the table name :(

Not to mention all the other system catalogs. You could maybe make this
idea work by regenerating the entire catalog set, but not by
regenerating just pg_attribute. But if you don't know the table-to-OID
mapping in the old database you're screwed anyway :-(

Exactly ... and I take it there are no 'headers' in the files themselves
to use for an association? some way that they are tag'd?

Is there any way of 'raw dumping' pg_attribute itself, to find out the
mappings?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#11The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#9)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Fri, 14 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

course that won't work, since its link'd to the oid of the table name :(
whose idea was this "let's name the files by the OID" again? :(

Actually, I think you can make this work, if you are sure of the schema
of the old database. Try something like this:

* Continue to work in the same installation; don't initdb. If you did
initdb then old transaction numbers would be wrong. Just create a new
database beside the old one (or maybe better, physically copy the old
one someplace and then drop and re-createdb it).

* Rebuild the schema. Now you have a lot of empty tables and you just
have to get the old data into them. That means you have to find out the
mapping from old table filenode numbers to new ones.

* To find out the old numbers, make a user table that has the identical
schema to pg_class (probably easiest to do this with the LIKE clause of
CREATE TABLE). Check its relfilenode number in pg_class, then copy the
old database's pg_class file over that relfilenode. Now you can query
this table to see the contents of the old pg_class.

* Join the new and old pg_class together to get corresponding
relfilenode numbers.

* Copy old table files into new database per the above. (I'd make a
script to do this instead of doing it by hand...) Also you'll need to
copy corresponding TOAST tables. Don't copy indexes though.

'k, this is looking promising ... but I'm a bit confused on the TOAST
tables ... I can't match on 'relname', since they aren't the same ... the
old has, for instance:

pg_toast_5773565

while the new has:

pg_toast_8709712

is there some sort of 'linkage' in pg_class that I'm not seeing? since new
is finding 21 rows, and old is only finding 20, I can't imagine its safe
to assume that the 'order of creation' will be safe to match on ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#11)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

'k, this is looking promising ... but I'm a bit confused on the TOAST
tables ... I can't match on 'relname', since they aren't the same ... the
old has, for instance:

pg_toast_5773565

while the new has:

pg_toast_8709712

is there some sort of 'linkage' in pg_class that I'm not seeing?

Yeah. A toast table's OID appears in the reltoastrelid field of its
owning table. So you match new and old pg_class entries by name,
take their reltoastrelid fields, look up those rows by OID, and their
relfilenode fields give the names of the TOAST files.

regards, tom lane

#13The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#12)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Fri, 14 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

'k, this is looking promising ... but I'm a bit confused on the TOAST
tables ... I can't match on 'relname', since they aren't the same ... the
old has, for instance:

pg_toast_5773565

while the new has:

pg_toast_8709712

is there some sort of 'linkage' in pg_class that I'm not seeing?

Yeah. A toast table's OID appears in the reltoastrelid field of its
owning table. So you match new and old pg_class entries by name,
take their reltoastrelid fields, look up those rows by OID, and their
relfilenode fields give the names of the TOAST files.

'k, there is somewhere else that is 'holding' the information ... the
table in question is 'email' ... when I do a 'select, it tells me:

restore=# select * from email;
ERROR: could not open relation with OID 5773277

which is the treltoastrelid from the old:

restore=# select reltoastrelid from npg_class where relname = 'email';
reltoastrelid
---------------
5773277
(1 row)

but, for the new, it should be:

restore=# select reltoastrelid from pg_class where relname = 'email';
reltoastrelid
---------------
8709051
(1 row)

Is there something in the 'table file' itself that is holding that relid?
the only thing from the old, as far as system tables, taht I've copied
over is the pg_class file into the npg_class relid, so I don't think
there is anything else at the system catalog level that could
inadvertantly contain it ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#13)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is there something in the 'table file' itself that is holding that relid?

Oh, drat, you're right --- TOAST pointer datums contain the OID of
the toast file they are pointing to. So any value wide enough to be
pushed out-of-line is going to have a problem.

Ugh. Not sure there is any good way out of this one. Adjusting the
pointer datums in-place doesn't seem very feasible.

regards, tom lane

#15The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#14)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Sat, 15 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is there something in the 'table file' itself that is holding that relid?

Oh, drat, you're right --- TOAST pointer datums contain the OID of
the toast file they are pointing to. So any value wide enough to be
pushed out-of-line is going to have a problem.

Ugh. Not sure there is any good way out of this one. Adjusting the
pointer datums in-place doesn't seem very feasible.

can I force the reuse of an OID? for instance, if I were to get the
map'ngs for those toast files, and then 'drop' the old database (I've got
several backups of it already), can I insert those records into pg_class,
with the proper OID? Then again, OIDs don't have to be unique across
databsaes, do they? they are only serial ... so I should be able to just
add those records into the new database, no?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#15)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

can I force the reuse of an OID? for instance, if I were to get the
map'ngs for those toast files, and then 'drop' the old database (I've got
several backups of it already), can I insert those records into pg_class,
with the proper OID?

I was wondering about that myself. If you could relabel the pg_class
rows for the new TOAST tables with the OIDs of the old TOAST tables,
you'd be set. (You'd need to change the relfilenode fields of their
owning tables too, probably, although if you just want to dump out the
data you might not have to bother with that.) The problem is that there
is no way to do that within Postgres. The only way I can think of is to
shut down the postmaster and change pg_class with a hex editor, which
seems mighty tedious and error-prone. However, with any luck you'd only
need to fix half a dozen or so entries ... how many tables in this
database have nonempty TOAST tables?

If you try this, keep in mind that you'd have to REINDEX pg_class
afterwards.

regards, tom lane

#17The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#16)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Sat, 15 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

can I force the reuse of an OID? for instance, if I were to get the
map'ngs for those toast files, and then 'drop' the old database (I've got
several backups of it already), can I insert those records into pg_class,
with the proper OID?

I was wondering about that myself. If you could relabel the pg_class
rows for the new TOAST tables with the OIDs of the old TOAST tables,
you'd be set. (You'd need to change the relfilenode fields of their
owning tables too, probably, although if you just want to dump out the
data you might not have to bother with that.) The problem is that there
is no way to do that within Postgres. The only way I can think of is to
shut down the postmaster and change pg_class with a hex editor, which
seems mighty tedious and error-prone. However, with any luck you'd only
need to fix half a dozen or so entries ... how many tables in this
database have nonempty TOAST tables?

Hrmmm ... how about if I pg_dump --oids pg_class, make the modes and then
reload it as opg_class, shut down the database server and move opg_class
over pg_class, after making the required modifications to opg_class?

restore=# select oid from pg_class where relname = 'email';
oid
---------
8709044
(1 row)

restore=# select oid from opg_class where relname = 'email';
oid
---------
8709044
(1 row)

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#17)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Hrmmm ... how about if I pg_dump --oids pg_class, make the modes and then
reload it as opg_class, shut down the database server and move opg_class
over pg_class, after making the required modifications to opg_class?

Good idea. Give it a shot. Don't forget you'll need to REINDEX pg_class.

regards, tom lane

#19The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#18)
Re: FATAL: catalog is missing 1 attribute(s) for relid

On Sat, 15 Jan 2005, Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Hrmmm ... how about if I pg_dump --oids pg_class, make the modes and then
reload it as opg_class, shut down the database server and move opg_class
over pg_class, after making the required modifications to opg_class?

Good idea. Give it a shot. Don't forget you'll need to REINDEX pg_class.

Doesn't seem to like that ...

backend> reindex pg_class;
ERROR: syntax error at or near "pg_class" at character 9

funny thing is, I can do a 'select * from pg_class' ... and I can do a
'select * from email' now too ... but this is all in single user mode ...
do it from 'multi user', and I get:

restore=# select * from email;
ERROR: relation "email" does not exist

and:

restore=# select * from pg_class;
ERROR: "category_id_remap_key" is an index

seems *damn* close though ... thoughts?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#19)
Re: FATAL: catalog is missing 1 attribute(s) for relid 16396

"Marc G. Fournier" <scrappy@postgresql.org> writes:

backend> reindex pg_class;
ERROR: syntax error at or near "pg_class" at character 9

"reindex table pg_class", I think. And you'll probably need to be doing
this in a backend started with -P command line option.

regards, tom lane

#21The Hermit Hacker
scrappy@hub.org
In reply to: The Hermit Hacker (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#21)
#23The Hermit Hacker
scrappy@hub.org
In reply to: The Hermit Hacker (#21)
#24The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#22)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#23)
#26The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#25)