Really odd corruption problem: cannot open pg_aggregate: No such file or directory

Started by Adam Haberlachover 22 years ago5 messages
#1Adam Haberlach
adam@newsnipple.com

So, one of the many machines that I support seems to have developed
an incredibly odd and specific corruption that I've never seen before.

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

If I do:
select * from pg_class where relname='pg_aggregate';
I see that the relation exists.

If I check the relfilenode in the data directory, that exists, and
seems to be an object file containing what should be the basic
aggregate functions.

version: PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

The system ran for a few weeks before anything odd happened, and
then suddenly this. Does anyone have any ideas? Now that I look at
the above string, I realize that the system /is/ an Athlon processor.
Does anyone know if there could be an issue between the i686 and
athlon optimizations?

--
Adam Haberlach | "When your product is stolen by thieves, you
adam@mediariffic.com | have a police problem. When it is stolen by
http://mediariffic.com | millions of honest customers, you have a
| marketing problem." - George Gilder

#2scott.marlowe
scott.marlowe@ihs.com
In reply to: Adam Haberlach (#1)
Re: Really odd corruption problem: cannot open pg_aggregate:

On Thu, 24 Jul 2003, Adam Haberlach wrote:

So, one of the many machines that I support seems to have developed
an incredibly odd and specific corruption that I've never seen before.

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

If I do:
select * from pg_class where relname='pg_aggregate';
I see that the relation exists.

If I check the relfilenode in the data directory, that exists, and
seems to be an object file containing what should be the basic
aggregate functions.

version: PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

The system ran for a few weeks before anything odd happened, and
then suddenly this. Does anyone have any ideas? Now that I look at
the above string, I realize that the system /is/ an Athlon processor.
Does anyone know if there could be an issue between the i686 and
athlon optimizations?

test your memory and drive subsystem first. memtest86.com has a nice
tester for free, and on linux badblocks can do a decent job (not great,
just decent) of finding bad blocks.

Postgresql is good, but it can't make up for bad hardware.

#3Doug McNaught
doug@mcnaught.org
In reply to: Adam Haberlach (#1)
Re: Really odd corruption problem: cannot open pg_aggregate: No such file or directory

Adam Haberlach <adam@newsnipple.com> writes:

So, one of the many machines that I support seems to have developed
an incredibly odd and specific corruption that I've never seen before.

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

Why not use 'strace' to see what file the backend is actually trying
to open?

-Doug

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Haberlach (#1)
Re: Really odd corruption problem: cannot open pg_aggregate: No such file or directory

Adam Haberlach <adam@newsnipple.com> writes:

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

Weird. It would be useful to find out exactly what pathname it's trying
to open. strace'ing the backend might be the easiest way.

Does anyone know if there could be an issue between the i686 and
athlon optimizations?

Seems unlikely that it would manifest this way, if so. The error is
coming from a low-level routine that would also be used for opening
any other table ...

regards, tom lane

#5Adam Haberlach
adam@newsnipple.com
In reply to: Adam Haberlach (#1)
Re: Really odd corruption problem: cannot open pg_aggregate: No such file or directory

On Thu, Jul 24, 2003 at 10:17:06AM -0700, Adam Haberlach wrote:

So, one of the many machines that I support seems to have developed
an incredibly odd and specific corruption that I've never seen before.

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

If I do:
select * from pg_class where relname='pg_aggregate';
I see that the relation exists.

If I check the relfilenode in the data directory, that exists, and
seems to be an object file containing what should be the basic
aggregate functions.

version: PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

The system ran for a few weeks before anything odd happened, and
then suddenly this. Does anyone have any ideas? Now that I look at
the above string, I realize that the system /is/ an Athlon processor.
Does anyone know if there could be an issue between the i686 and

I'd like to thank everyone for the quick responses and the suggestion
to strace the postmaster.

open("/var/lib/pgsql/data/base/16556/16406", O_RDWR) = -1 ENOENT (No such file or directory)

It looks like a file /was/ missing, and I had been looking in the
wrong place to verify that it was there (the template database). I'm
going to chalk this one up to bad hardware and hope it doesn't happen
again. Thanks again...

--
Adam Haberlach | "When your product is stolen by thieves, you
adam@mediariffic.com | have a police problem. When it is stolen by
http://mediariffic.com | millions of honest customers, you have a
| marketing problem." - George Gilder