Getting "cache lookup failed for aggregate" error

Started by Patrick Kreckeralmost 12 years ago4 messagesgeneral
Jump to latest
#1Patrick Krecker
patrick@judicata.com

Hello everyone -- We received a strange error today on our production write
master. During a routine maintenance script, we got the following error:

"ERROR: cache lookup failed for aggregate 5953992"

I think I found the culprit. We have a cron script that (among other
things) recreates the aggregate function array_cat_aggregate()once every
minute. My guess is that a) transactions do not guarantee a snapshot of
custom functions and b) we got unlucky and the aggregate was deleted during
the execution of the UPDATE.

I was just wondering if my analysis is correct, so I can make the necessary
changes to production to prevent this from happening again.

Here is the query that produced the error:

UPDATE marbury_case SET
components_vector = (
SELECT array_cat_aggregate(component) FROM (
SELECT ARRAY[
id,
type_id,
"offset",
length,
internal_id,
parent_id,
right_sibling_id] AS component
FROM marbury_component WHERE case_id = marbury_case.id ORDER BY id)
AS foo),
attributes_json = (
SELECT array_to_json(array_agg(attributes || hstore('_ind',
ind::text))) FROM (
SELECT (rank() OVER (ORDER BY id)) - 1 AS ind, attributes
FROM marbury_component
WHERE case_id = marbury_case.id ORDER BY id)
AS foo WHERE attributes IS NOT NULL
AND array_length(akeys(attributes), 1) > 0),
vectors_updated = timeofday()::timestamp
WHERE id = 71865

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Patrick Krecker (#1)
Re: Getting "cache lookup failed for aggregate" error

Patrick Krecker <patrick@judicata.com> writes:

Hello everyone -- We received a strange error today on our production write
master. During a routine maintenance script, we got the following error:

"ERROR: cache lookup failed for aggregate 5953992"

I think I found the culprit. We have a cron script that (among other
things) recreates the aggregate function array_cat_aggregate()once every
minute.

Um. Why's it do that?

My guess is that a) transactions do not guarantee a snapshot of
custom functions and b) we got unlucky and the aggregate was deleted during
the execution of the UPDATE.

Yeah, there's no guarantee that dropping the aggregate wouldn't leave a
window for this type of failure.

9.4 might be a little better about this because it doesn't use SnapshotNow
for catalog fetches anymore, but I think you'd still be at some risk.

Here is the query that produced the error:

It'd be more interesting to see what the cron script was doing to the
aggregate definition.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Patrick Krecker
patrick@judicata.com
In reply to: Tom Lane (#2)
Re: Getting "cache lookup failed for aggregate" error

On Tue, Jun 24, 2014 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Patrick Krecker <patrick@judicata.com> writes:

Hello everyone -- We received a strange error today on our production

write

master. During a routine maintenance script, we got the following error:

"ERROR: cache lookup failed for aggregate 5953992"

I think I found the culprit. We have a cron script that (among other
things) recreates the aggregate function array_cat_aggregate()once every
minute.

Um. Why's it do that?

Well, it's admittedly a hack. The cron script uses the function just after
creating it. It was an easy way of guaranteeing that the database will have
the function when it's necessary. However, I suppose that assumption is
wrong because you have confirmed that aggregate functions are not part of
the snapshot provided by beginning a transaction.

My guess is that a) transactions do not guarantee a snapshot of
custom functions and b) we got unlucky and the aggregate was deleted

during

the execution of the UPDATE.

Yeah, there's no guarantee that dropping the aggregate wouldn't leave a
window for this type of failure.

9.4 might be a little better about this because it doesn't use SnapshotNow
for catalog fetches anymore, but I think you'd still be at some risk.

Here is the query that produced the error:

It'd be more interesting to see what the cron script was doing to the
aggregate definition.

FWIW the SQL is

DROP AGGREGATE IF EXISTS array_cat_aggregate(anyarray);
CREATE AGGREGATE array_cat_aggregate(anyarray) (
SFUNC = array_cat,
STYPE = anyarray,
INITCOND = '{}'
);

Followed by the other statement given in my previous email. But, I think
you've thoroughly answered by question. Thanks!

Show quoted text

regards, tom lane

#4Igor Neyman
ineyman@perceptron.com
In reply to: Patrick Krecker (#3)
Re: Getting "cache lookup failed for aggregate" error

FWIW the SQL is

DROP AGGREGATE IF EXISTS array_cat_aggregate(anyarray);
CREATE AGGREGATE array_cat_aggregate(anyarray) (
SFUNC = array_cat,
STYPE = anyarray,
INITCOND = '{}'
);

Followed by the other statement given in my previous email. But, I think you've thoroughly answered by question. Thanks!

So, instead of dropping aggregate “if exists” why not check pg_catalog for aggregate existence, and create it only if it does NOT exist?

Regards,
Igor Neyman