Unable to determine what has a particular OID

Started by Andy Dalealmost 16 years ago15 messagesgeneral
Jump to latest
#1Andy Dale
andy.dale@gmail.com

Hi,

We currently have a DB (8.3.7) in production that seem to be taking up more
space on the HDD than was anticipated.

After looking into the the data directory a particular file/oid is around 21
GB (the oid has 21 files), this OID can then be found in the pg_class table,
the explanation for the size being that the table has around 12.5 Million
rows. During the inspection of the data directory I observed another OID
that is around 8.5GB, but I cannot locate the corresponding entry in the
pg_class table (even after dumping the table into a file and grep-ing for
the OID in question).

Any other ideas for finding what the mystery OID is ?

Thanks in advance,

Andy

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Dale (#1)
Re: Unable to determine what has a particular OID

Andy Dale <andy.dale@gmail.com> writes:

After looking into the the data directory a particular file/oid is around 21
GB (the oid has 21 files), this OID can then be found in the pg_class table,
the explanation for the size being that the table has around 12.5 Million
rows. During the inspection of the data directory I observed another OID
that is around 8.5GB, but I cannot locate the corresponding entry in the
pg_class table (even after dumping the table into a file and grep-ing for
the OID in question).

You should be looking at pg_class.relfilenode, not OID. See
http://www.postgresql.org/docs/8.3/static/storage.html

regards, tom lane

#3Andy Dale
andy.dale@gmail.com
In reply to: Tom Lane (#2)
Re: Unable to determine what has a particular OID

On 18 June 2010 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Dale <andy.dale@gmail.com> writes:

After looking into the the data directory a particular file/oid is around

21

GB (the oid has 21 files), this OID can then be found in the pg_class

table,

the explanation for the size being that the table has around 12.5 Million
rows. During the inspection of the data directory I observed another OID
that is around 8.5GB, but I cannot locate the corresponding entry in the
pg_class table (even after dumping the table into a file and grep-ing for
the OID in question).

You should be looking at pg_class.relfilenode, not OID. See
http://www.postgresql.org/docs/8.3/static/storage.html

regards, tom lane

Ok, but when I have dumped the pg_class table into a file like so:

-[ RECORD 1
]--+----------------------------------------------------------------
relname | pg_type
relnamespace | 11
reltype | 71
relowner | 10
relam | 0
relfilenode | 1247
reltablespace | 0
relpages | 19
reltuples | 796
reltoastrelid | 0
reltoastidxid | 0
relhasindex | t
relisshared | f
relkind | r
relnatts | 26
relchecks | 0
reltriggers | 0
relukeys | 0
relfkeys | 0
relrefs | 0
relhasoids | t
relhaspkey | f
relhasrules | f
relhassubclass | f
relfrozenxid | 379
relacl | {=r/postgres}
reloptions |
-[ RECORD 2
]--+----------------------------------------------------------------
relname | triggered_update_columns
relnamespace | 11313
reltype | 11477
relowner | 10
relam | 0
relfilenode | 11476
reltablespace | 0
relpages | 0
reltuples | 0
reltoastrelid | 0

Grep-ing for the correct oid/filenode (110660 in my case), nothing is
returned.

Cheers,

Andy

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Andy Dale (#3)
Re: Unable to determine what has a particular OID

On Friday 18 June 2010 7:15:48 am Andy Dale wrote:

On 18 June 2010 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Dale <andy.dale@gmail.com> writes:

After looking into the the data directory a particular file/oid is
around

21

GB (the oid has 21 files), this OID can then be found in the pg_class

table,

the explanation for the size being that the table has around 12.5
Million rows. During the inspection of the data directory I observed
another OID that is around 8.5GB, but I cannot locate the corresponding
entry in the pg_class table (even after dumping the table into a file
and grep-ing for the OID in question).

You should be looking at pg_class.relfilenode, not OID. See
http://www.postgresql.org/docs/8.3/static/storage.html

regards, tom lane

Ok, but when I have dumped the pg_class table into a file like so:

-[ RECORD 1
]--+----------------------------------------------------------------
relname | pg_type
relnamespace | 11
reltype | 71
relowner | 10
relam | 0
relfilenode | 1247
reltablespace | 0
relpages | 19
reltuples | 796
reltoastrelid | 0
reltoastidxid | 0
relhasindex | t
relisshared | f
relkind | r
relnatts | 26
relchecks | 0
reltriggers | 0
relukeys | 0
relfkeys | 0
relrefs | 0
relhasoids | t
relhaspkey | f
relhasrules | f
relhassubclass | f
relfrozenxid | 379
relacl | {=r/postgres}
reloptions |
-[ RECORD 2
]--+----------------------------------------------------------------
relname | triggered_update_columns
relnamespace | 11313
reltype | 11477
relowner | 10
relam | 0
relfilenode | 11476
reltablespace | 0
relpages | 0
reltuples | 0
reltoastrelid | 0

Grep-ing for the correct oid/filenode (110660 in my case), nothing is
returned.

Cheers,

Andy

Try:
SELECT oid,* from pg_database ;

I suspect the relfilnode you are looking at is another database in the cluster.

--
Adrian Klaver
adrian.klaver@gmail.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Dale (#3)
Re: Unable to determine what has a particular OID

Andy Dale <andy.dale@gmail.com> writes:

On 18 June 2010 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You should be looking at pg_class.relfilenode, not OID. See
http://www.postgresql.org/docs/8.3/static/storage.html

Ok, but when I have dumped the pg_class table into a file like so:
...
Grep-ing for the correct oid/filenode (110660 in my case), nothing is
returned.

Hm. You're sure you're looking at pg_class in the right database?

If there really isn't a pg_class entry matching that relfilenode,
then the files are orphans and can be removed. There are some scenarios
where orphan files can be left behind, but they generally involve
database crashes ... have you had any of those?

regards, tom lane

#6Andy Dale
andy.dale@gmail.com
In reply to: Adrian Klaver (#4)
Re: Unable to determine what has a particular OID

Try:
SELECT oid,* from pg_database ;

I suspect the relfilnode you are looking at is another database in the
cluster.

--
Adrian Klaver
adrian.klaver@gmail.com

Maybe I am possibly doing that. I did a du -h --max-depth=1 in the base/
within the data directory. This showed a single folder that contained ~34
GB. On performing a ls in this, it showed 2 oids/filenodes that had more
than 1 GB (multiple files with .<number> at the end). The large table I
mentioned in the original post was in this folder, so I assumed that the
rest of the files in the same directory belonged to the same DB.

Have I made an incorrect assumption ?

#7Andy Dale
andy.dale@gmail.com
In reply to: Tom Lane (#5)
Re: Unable to determine what has a particular OID

On 18 June 2010 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Dale <andy.dale@gmail.com> writes:

On 18 June 2010 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You should be looking at pg_class.relfilenode, not OID. See
http://www.postgresql.org/docs/8.3/static/storage.html

Ok, but when I have dumped the pg_class table into a file like so:
...
Grep-ing for the correct oid/filenode (110660 in my case), nothing is
returned.

Hm. You're sure you're looking at pg_class in the right database?

If there really isn't a pg_class entry matching that relfilenode,
then the files are orphans and can be removed. There are some scenarios
where orphan files can be left behind, but they generally involve
database crashes ... have you had any of those?

regards, tom lane

To the best of my knowledge we have not had any DB crashes as yet. I am
also replicating the DB using slony, could this be the cause of these weird
orphans ? (it is neither of the sl_log_ tables)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Dale (#6)
Re: Unable to determine what has a particular OID

Andy Dale <andy.dale@gmail.com> writes:

Try:
SELECT oid,* from pg_database ;

I suspect the relfilnode you are looking at is another database in the
cluster.

Maybe I am possibly doing that.

There isn't any "maybe" involved here. Follow Adrian's advice and
determine for sure exactly which database that subdirectory represents.
The subdirectory name will match the OID in the pg_database row.

It's possible for the same table OID to be in use in more than one
database, so the fact that you got a match to the other table doesn't
in itself prove you're looking at the right database.

regards, tom lane

#9Andy Dale
andy.dale@gmail.com
In reply to: Andy Dale (#7)
Re: Unable to determine what has a particular OID

To the best of my knowledge we have not had any DB crashes as yet. I am
also replicating the DB using slony, could this be the cause of these weird
orphans ? (it is neither of the sl_log_ tables)

Also looking at the timestamps of the files, they all seem to have been
created within a very short period of time on the 6th of April

-rw------- 1 postgres postgres 57344 2010-04-06 03:10 110653
-rw------- 1 postgres postgres 0 2010-04-06 03:08 110654
-rw------- 1 postgres postgres 40960 2010-04-06 03:08 110657
-rw------- 1 postgres postgres 8192 2010-04-06 03:08 110658
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:10 110660
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:12 110660.1
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:13 110660.2
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:15 110660.3
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:16 110660.4
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:18 110660.5
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:20 110660.6
-rw------- 1 postgres postgres 1073741824 2010-04-06 03:21 110660.7
-rw------- 1 postgres postgres 26566656 2010-04-06 03:21 110660.8
-rw------- 1 postgres postgres 180224 2010-04-06 03:21 110661
-rw------- 1 postgres postgres 8192 2010-04-06 03:08 110662
-rw------- 1 postgres postgres 8192 2010-04-06 03:08 110663
-rw------- 1 postgres postgres 16384 2010-04-06 03:21 110664
-rw------- 1 postgres postgres 113254400 2010-04-06 03:25 110763
-rw------- 1 postgres postgres 0 2010-04-06 03:25 110765
-rw------- 1 postgres postgres 1073741824 2010-06-03 22:17 112806

#10Andy Dale
andy.dale@gmail.com
In reply to: Tom Lane (#8)
Re: Unable to determine what has a particular OID

On 18 June 2010 17:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Dale <andy.dale@gmail.com> writes:

Try:
SELECT oid,* from pg_database ;

I suspect the relfilnode you are looking at is another database in the
cluster.

Maybe I am possibly doing that.

There isn't any "maybe" involved here. Follow Adrian's advice and
determine for sure exactly which database that subdirectory represents.
The subdirectory name will match the OID in the pg_database row.

It's possible for the same table OID to be in use in more than one
database, so the fact that you got a match to the other table doesn't
in itself prove you're looking at the right database.

regards, tom lane

OK, I have just run the query, and both are in the same database.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Dale (#10)
Re: Unable to determine what has a particular OID

Andy Dale <andy.dale@gmail.com> writes:

OK, I have just run the query, and both are in the same database.

OK. Given the narrow range of timestamps on the files, the most
likely bet here is that you're looking at the leftovers from a
"CREATE TABLE AS" or similar command that failed partway through and
for some reason didn't clean up the files it'd created. As I mentioned,
this'd be unsurprising if there'd been a database crash or kill -9
or something similar involved. Anyway I think you've done enough to
establish that "rm'ing" those files will be safe. (But if you're
paranoid you might want to save them someplace first.)

regards, tom lane

#12Andy Dale
andy.dale@gmail.com
In reply to: Tom Lane (#11)
Re: Unable to determine what has a particular OID

On 18 June 2010 17:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Dale <andy.dale@gmail.com> writes:

OK, I have just run the query, and both are in the same database.

OK. Given the narrow range of timestamps on the files, the most
likely bet here is that you're looking at the leftovers from a
"CREATE TABLE AS" or similar command that failed partway through and
for some reason didn't clean up the files it'd created. As I mentioned,
this'd be unsurprising if there'd been a database crash or kill -9
or something similar involved. Anyway I think you've done enough to
establish that "rm'ing" those files will be safe. (But if you're
paranoid you might want to save them someplace first.)

regards, tom lane

Thanks for confirming this, I will however save/backup the files before
removing them (just in case anything goes wrong).

#13Greg Smith
gsmith@gregsmith.com
In reply to: Andy Dale (#12)
Re: Unable to determine what has a particular OID

Andy Dale wrote:

I will however save/backup the files before removing them (just in
case anything goes wrong).

If you do go through with this, I would recommend doing a complete
database dump afterwards using pg_dump/pg_dumpall. That should make it
immediately obvious if you broke something with the change.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#13)
Re: Unable to determine what has a particular OID

Greg Smith <greg@2ndquadrant.com> writes:

Andy Dale wrote:

I will however save/backup the files before removing them (just in
case anything goes wrong).

If you do go through with this, I would recommend doing a complete
database dump afterwards using pg_dump/pg_dumpall. That should make it
immediately obvious if you broke something with the change.

Doing pg_dump would expose the mistake if you'd removed an actual
table's files. But I'm not sure it would expose it if you removed
an index ...

regards, tom lane

#15Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#14)
Re: Unable to determine what has a particular OID

Tom Lane wrote:

Doing pg_dump would expose the mistake if you'd removed an actual
table's files. But I'm not sure it would expose it if you removed
an index ...

Right, but in theory if you screwed up and accidentally deleted a file
holding an index, you could recover from that in the possibly distant
future by rebuilding it, with some pain but no expected loss. Whereas
if you deleted some data by removing a file, you really want to know
that's what you did immediately, so you can put it back before you
forget where it all was at.

Sometimes people who have fully setup replication for backup purposes
ask me if they should continue saving pg_dump output somewhere. I think
it's reasonable to generate a dump using it periodically whether or not
you intend to save the result permanently, just as a paranoid sanity
check that you can still read everything. I don't trust disks and
filesystems that much.

(If you're reading this and feel the need to write a pro-ZFS essay at
this point, consider yourself trolled)

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us