Cache lookup failed?

Started by Jeff Boesover 22 years ago7 messagesgeneral
Jump to latest
#1Jeff Boes
jboes@nexcerpt.com

What might be the source of this error?

Cache lookup failed for relation 188485009

We've been getting these at odd intervals, and they are not reproducible.

Our setup:

PostgreSQL 7.3.3
Red Hat 7.3

kernel.shmall = 1352914698
kernel.shmmax = 1352914698

shared_buffers = 131072
max_fsm_pages = 350000
max_fsm_relations = 200
wal_buffers = 32
sort_mem = 65536
vacuum_mem = 65536
effective_cache_size = 196608

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Jeff Boes (#1)
Re: Cache lookup failed?

It usually refers to some cached plan referring to a table or object that
does not exist anymore. Do you have stored procedures that refer to tables
that are deleted? This includes temporary tables.

So maybe it's only happening when a certain stored procedure is executed
twice in the same session?

Hope this helps,

On Mon, Aug 18, 2003 at 01:01:53PM +0000, Jeff Boes wrote:

What might be the source of this error?

Cache lookup failed for relation 188485009

We've been getting these at odd intervals, and they are not reproducible.

Our setup:

PostgreSQL 7.3.3
Red Hat 7.3

kernel.shmall = 1352914698
kernel.shmmax = 1352914698

shared_buffers = 131072
max_fsm_pages = 350000
max_fsm_relations = 200
wal_buffers = 32
sort_mem = 65536
vacuum_mem = 65536
effective_cache_size = 196608

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Boes (#1)
Re: Cache lookup failed?

"Jeff Boes" <jboes@nexcerpt.com> writes:

What might be the source of this error?
Cache lookup failed for relation 188485009
We've been getting these at odd intervals, and they are not reproducible.

Always the same OID, or different ones? Does that OID actually exist in
pg_class? Can you tell us exactly what SQL command(s) are producing the
error? (If not, better turn on query logging so you can find out.)

Our setup:
PostgreSQL 7.3.3

BTW, I'd urge updating to 7.3.4 ASAP. Better to do it in a controlled
fashion than to find yourself looking at a forced update if 7.3.3 fails
to restart after a crash...

regards, tom lane

#4Jeff Boes
jboes@nexcerpt.com
In reply to: Martijn van Oosterhout (#2)
Re: Cache lookup failed?

On Mon, 2003-08-18 at 09:44, Martijn van Oosterhout wrote:

It usually refers to some cached plan referring to a table or object that
does not exist anymore. Do you have stored procedures that refer to tables
that are deleted? This includes temporary tables.

So maybe it's only happening when a certain stored procedure is executed
twice in the same session?

No, I don't think so. It seems to be pretty consistently happening in
our homegrown database connection class, as we are executing a query
against the PG metadata tables. Something like --

select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull,
a.atthasdef, a.attnum
from pg_attribute a,
pg_class c,
pg_type t
where c.relname = ?
and a.attrelid = c.oid
and a.attnum >= 0
and t.oid = a.atttypid
order by 1

This is happening inside DBD::Pg, the "table_attributes" method. The
relname being selected is NOT a temp table. We do make extensive use of
temp tables in the code, however.

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

#5Jeff Boes
jboes@nexcerpt.com
In reply to: Tom Lane (#3)
Re: Cache lookup failed?

On Mon, 2003-08-18 at 09:53, Tom Lane wrote:

Always the same OID, or different ones? Does that OID actually exist in
pg_class? Can you tell us exactly what SQL command(s) are producing the
error? (If not, better turn on query logging so you can find out.)

Different OIDs, and they do not exist in pg_class (it's the OID of that
table's row, right? So for

Cache lookup failed for relation 172465102

I would do

select * from pg_class where oid = 172465102

right? I'm not 100% familiar yet with the ins and outs of pg_class. Too
many OID-type fields in there, I can't keep them straight ... 8-}

I'd turn on query logging, but since we're getting these about every 3-7
days, I'm not sure that would be the most effective use of all that disk
... maybe I can find a way to localize it to the point where the
pg_class query is happening.

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Boes (#5)
Re: Cache lookup failed?

Jeff Boes <jboes@nexcerpt.com> writes:

On Mon, 2003-08-18 at 09:53, Tom Lane wrote:

Always the same OID, or different ones? Does that OID actually exist in
pg_class? Can you tell us exactly what SQL command(s) are producing the
error? (If not, better turn on query logging so you can find out.)

Different OIDs, and they do not exist in pg_class (it's the OID of that
table's row, right?

Right. My best guess is that you are seeing some weird failure in temp
table creation ... do you use lots of temp tables?

I'd turn on query logging, but since we're getting these about every 3-7
days, I'm not sure that would be the most effective use of all that disk

Perhaps you can recycle the logs every few hours?

BTW, the symptom sounds the same as the one that led up to the discovery
of this bug:

2003-07-29 18:18 tgl

* src/backend/access/nbtree/: nbtsearch.c (REL7_3_STABLE),
nbtsearch.c (REL7_2_STABLE), nbtsearch.c: Fix longstanding error in
_bt_search(): should moveright at top of loop not bottom.
Otherwise we fail to moveright when the root page was split while
we were "in flight" to it. This is not a significant problem when
the root is above the leaf level, but if the root was also a leaf
(ie, a single-page index just got split) we may return the wrong
leaf page to the caller, resulting in failure to find a key that is
in fact present. Bug has existed at least since 7.1, probably
forever.

However, I doubt that that is your problem. The moveright bug could
only lead to a pg_class lookup failure if a lookup occurred while
pg_class' OID index was being split from one page to two, which is an
event that happens at most once in the lifetime of an index (before
7.4 anyway). Unless you frequently create new databases, or frequently
reindex pg_class, I don't see how you would see that bug with any
reproducibility. (We were only able to track down the bug because
the regression tests evolved to a point where they caused it with
nontrivial probability.)

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Boes (#4)
Re: Cache lookup failed?

Jeff Boes <jboes@nexcerpt.com> writes:

No, I don't think so. It seems to be pretty consistently happening in
our homegrown database connection class, as we are executing a query
against the PG metadata tables. Something like --

select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull,
a.atthasdef, a.attnum
from pg_attribute a,
pg_class c,
pg_type t
where c.relname = ?
and a.attrelid = c.oid
and a.attnum >= 0
and t.oid = a.atttypid
order by 1

I really do not believe that this query could cause "cache lookup failed
for relation-with-a-large-OID".  The only relations being directly
touched in this query are pg_attribute, pg_class, pg_type, and their
indexes, which will all have small OIDs (less than 20000).  The fact
that you are pulling data that refers to other relations is not relevant
--- it's just data.  The "cache lookup" message could only occur when
the system is trying to access a relation as such, and I see nothing
here that would make it do that for any user relations.

It's too bad you're not running 7.4 ... it'd be really useful to know
exactly which "cache lookup" is failing, but there's no way to get that
info out of 7.3 without a debugger stack trace ...

regards, tom lane