cache lookup failed for attribute 1 of relation XXXXXX
Hi, we have a logical backup process that runs every night since 5+ years.
It is a logical backup we use to restore a non production environment. We
use pg_dump in parallel mode in directory format.
Postgres version is 9.6.6
Tonight schedule failed with the following error:
pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for
attribute 1 of relation 2223152859
pg_dump: [archiver (db)] query was: SELECT t.tableoid, t.oid, t.relname
AS indexname, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef,
t.relnatts AS indnkeys, i.indkey, i.indisclustered, i.indisreplident,
t.relpages, c.contype, c.conname, c.condeferrable, c.condeferred,
c.tableoid AS contableoid, c.oid AS conoid,
pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname
FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS
tablespace, t.reloptions AS indreloptions FROM pg_catalog.pg_index i JOIN
pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN
pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid =
c.conindid AND c.contype IN ('p','u','x')) WHERE i.indrelid =
'2223152859'::pg_catalog.oid AND i.indisvalid AND i.indisready ORDER BY
indexname
We attempted to run the backup manually the 2nd time just after a couple of
minutes and it suceeded with no issues, the restore of the non production
env suceeded too. Not sure what this error is though. Have never seen it
before.
Index with that indexrelid does not exists
SELECT * FROM pg_catalog.pg_index WHERE indexrelid = '2223152859';
(0 rows)
Any idea about what happened? In the server logs I see only one occurrence
of that error and it is related to the failed pg_dump:
2018-07-19 01:04:26 GMT [127.0.0.1(52498)] [50816]: [13-1]
db=cmdv3,user=postgres ERROR: cache lookup failed for attribute 1 of
relation 2223152859
Thanks much in advance
Alessandro Aste <alessandro.aste@gmail.com> writes:
pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for
attribute 1 of relation 2223152859
pg_dump: [archiver (db)] query was: SELECT t.tableoid, t.oid, t.relname
AS indexname, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef,
t.relnatts AS indnkeys, i.indkey, i.indisclustered, i.indisreplident,
t.relpages, c.contype, c.conname, c.condeferrable, c.condeferred,
c.tableoid AS contableoid, c.oid AS conoid,
pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname
FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS
tablespace, t.reloptions AS indreloptions FROM pg_catalog.pg_index i JOIN
pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN
pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid =
c.conindid AND c.contype IN ('p','u','x')) WHERE i.indrelid =
'2223152859'::pg_catalog.oid AND i.indisvalid AND i.indisready ORDER BY
indexname
Hmm ...
Index with that indexrelid does not exists
SELECT * FROM pg_catalog.pg_index WHERE indexrelid = '2223152859';
(0 rows)
You sure? The failed query was looking for indrelid not indexrelid.
That implies that 2223152859 is/was the table not the index. In
any case it'd be better to look for that OID in pg_class.
If that table is in fact gone, one possible theory is that it was
dropped concurrently with the pg_dump run. Then possibly
pg_get_indexdef() would see the table as already gone when the
surrounding query had found it in the catalogs, which could easily
result in the reported error. However, pg_dump shouldn't be poking
into the indexes of a table that it doesn't have AccessShareLock on,
so I'm not quite sure how we could end up with this result. Maybe
the true explanation is more complicated, like an ALTER DROP COLUMN
that committed just as pg_dump was starting. Have you got records
of any DDL being done at that time?
regards, tom lane
[ please keep the list cc'd for the archives' sake ]
Alessandro Aste <alessandro.aste@gmail.com> writes:
Hello Tom, thanks for your reply:
SELECT * FROM pg_class WHERE OID = 2223152859 ;
(0 rows)
I'm not aware of any DDL at that time.
Hm. Well, that OID was definitely there when pg_dump looked, and
it's not there now, so something changed --- though we can't prove
it changed concurrently.
In any case, I'd bet that if we ran this to ground it would prove to be a
concurrent-DDL issue. pg_dump tries to protect itself against concurrent
DDL, but for assorted architectural reasons the protection is not 100%;
sometimes you can get odd failures like this, essentially due to "clock
skew" between pg_dump's view of the catalogs and the server's view of the
catalogs. As long as it works on retry, I wouldn't worry too much about
it.
regards, tom lane
Import Notes
Reply to msg id not found: CAM9F+O1kod1Tt4ApVWcyESwBzpRTezifW6g3MmM7kQj77vO_yw@mail.gmail.com
Thanks much, I'll keep my eyes open today night hoping it will not happen
again.
On Thu, Jul 19, 2018 at 5:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
[ please keep the list cc'd for the archives' sake ]
Alessandro Aste <alessandro.aste@gmail.com> writes:
Hello Tom, thanks for your reply:
SELECT * FROM pg_class WHERE OID = 2223152859 ;
(0 rows)
I'm not aware of any DDL at that time.Hm. Well, that OID was definitely there when pg_dump looked, and
it's not there now, so something changed --- though we can't prove
it changed concurrently.In any case, I'd bet that if we ran this to ground it would prove to be a
concurrent-DDL issue. pg_dump tries to protect itself against concurrent
DDL, but for assorted architectural reasons the protection is not 100%;
sometimes you can get odd failures like this, essentially due to "clock
skew" between pg_dump's view of the catalogs and the server's view of the
catalogs. As long as it works on retry, I wouldn't worry too much about
it.regards, tom lane