BUG #15000: Cache lookup failure

Started by PG Bug reporting formover 8 years ago7 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15000
Logged by: Thomas Kellerer
Email address: postgres@kellerer.name
PostgreSQL version: 10.1
Operating system: CentOS release 6.6
Description:

We have a script that copies data from one database to another using pg_dump
(plain text format) and the pipes the output to psql

The script consistently fails with:

pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for
relation 16693
pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
FROM pg_catalog.pg_constraint WHERE conr elid =
'16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname

Postgres version is:
PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313
(Red Hat 4.4.7-18), 64-bit

Apparently I can't attach the script here, so it can be downloaded from:

https://sql-workbench.net/pg/localized_texts_cache_lookup_error.sh

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #15000: Cache lookup failure

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:

We have a script that copies data from one database to another using pg_dump
(plain text format) and the pipes the output to psql
The script consistently fails with:

pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for
relation 16693
pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
FROM pg_catalog.pg_constraint WHERE conr elid =
'16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname

Hmph. Is it always the same OID in the message and the reported query?
If so, what table does that correspond to (try select '16693'::regclass)
and is there anything odd about that table's declaration or usage?

It's hard to explain this as a race condition because pg_dump should
have a lock on the table by the time it issues this query. So I'm
not sure what's going on.

regards, tom lane

#3Thomas Kellerer
thomas@kellerer.eu
In reply to: Tom Lane (#2)
Re: BUG #15000: Cache lookup failure

Tom Lane schrieb am 08.01.2018 um 16:12:

We have a script that copies data from one database to another using pg_dump
(plain text format) and the pipes the output to psql
The script consistently fails with:

pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for
relation 16693
pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
FROM pg_catalog.pg_constraint WHERE conr elid =
'16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname

Hmph. Is it always the same OID in the message and the reported query?

No, it's not always the same OID.

I just re-ran the script and got

ERROR: cache lookup failed for relation 16816

If so, what table does that correspond to (try select '16693'::regclass)

When I do that right after the error occurred then of course, nothing is shown:

prod=# select '16816'::regclass;
regclass
----------
16816
(1 row)

Same result on the source database.

The OIDs for everything that the script creates (in the target database) are

oid | relname
-------+-------------------
16790 | shop_pkey
16792 | language_seq
16794 | languages
16797 | languages_pkey
16799 | l10n_seq
16787 | shop
16847 | l10n_pkey
16849 | l10n_unique
16851 | l10n_value_unique
16832 | l10n
16838 | l10n_value

(For the above run, with "cache lookup failed for relation 16816")

and is there anything odd about that table's declaration or usage?

No, nothing special.

For completeness, here is the DDL that is run by the shell script:

CREATE TABLE shop
(
shop_id bigint NOT NULL
);

ALTER TABLE shop
ADD CONSTRAINT shop_pkey
PRIMARY KEY (shop_id);

INSERT INTO shop VALUES (1), (2), (3);

DROP SEQUENCE IF EXISTS language_seq;

CREATE SEQUENCE language_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 2147483646
CACHE 1
NO CYCLE;

DROP TABLE IF EXISTS languages CASCADE;

CREATE TABLE languages
(
language_id char(2) NOT NULL,
image_url varchar(300) NOT NULL,
description varchar(100)
);

ALTER TABLE languages
ADD CONSTRAINT languages_pkey
PRIMARY KEY (language_id);

DROP SEQUENCE IF EXISTS l10n_seq;

CREATE SEQUENCE l10n_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 2147483646
CACHE 1
NO CYCLE;

DROP TABLE IF EXISTS l10n CASCADE;

CREATE TABLE l10n
(
l10n_id bigint NOT NULL,
l10n_key varchar(80) NOT NULL,
shop_id bigint NOT NULL,
description varchar(4000),
image_url varchar(256)
);

ALTER TABLE l10n
ADD CONSTRAINT l10n_pkey
PRIMARY KEY (l10n_id);

ALTER TABLE l10n
ADD CONSTRAINT l10n_shop_fk FOREIGN KEY (shop_id)
REFERENCES shop (shop_id)
ON UPDATE NO ACTION
ON DELETE NO ACTION;

ALTER TABLE l10n
ADD CONSTRAINT l10n_unique UNIQUE (l10n_key, shop_id);

DROP TABLE IF EXISTS l10n_value CASCADE;

CREATE TABLE l10n_value
(
l10n_id bigint NOT NULL,
language_id varchar(5) NOT NULL,
l10n_value varchar(4000) NOT NULL,
dirty char(1) DEFAULT '0'::bpchar,
cms_user varchar(20),
modified_at timestamp DEFAULT now(),
CONSTRAINT l10n_value_check_dirty CHECK ((dirty = '0'::bpchar) OR (dirty = '1'::bpchar))
);

ALTER TABLE l10n_value
ADD CONSTRAINT l10n_value_l10n_fk FOREIGN KEY (l10n_id)
REFERENCES l10n (l10n_id)
ON UPDATE NO ACTION
ON DELETE CASCADE;

ALTER TABLE l10n_value
ADD CONSTRAINT l10n_value_unique UNIQUE (l10n_id, language_id);

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Kellerer (#3)
Re: BUG #15000: Cache lookup failure

Thomas Kellerer <thomas@kellerer.eu> writes:

Tom Lane schrieb am 08.01.2018 um 16:12:

Hmph. Is it always the same OID in the message and the reported query?

No, it's not always the same OID.

OK. The explanation then presumably is that something is dropping that
table concurrently with the pg_dump run. It remains unclear to me why
pg_dump is failing to protect itself against such a drop, but before we
can figure that out, we need to be able to reproduce the situation.
Can you identify anything that would be dropping tables concurrently
with this script?

regards, tom lane

#5Thomas Kellerer
thomas@kellerer.eu
In reply to: Tom Lane (#4)
Re: BUG #15000: Cache lookup failure

Tom Lane schrieb am 08.01.2018 um 16:35:

OK. The explanation then presumably is that something is dropping that
table concurrently with the pg_dump run. It remains unclear to me why
pg_dump is failing to protect itself against such a drop, but before we
can figure that out, we need to be able to reproduce the situation.

Can you identify anything that would be dropping tables concurrently
with this script?

I feel ashamed :(

There was a left over pipe symbol in the script that we didn't see:

pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
psql --single-transaction --dbname=${db_prod} --file=${backup_stag}

So pg_dump pipes a script with DROP statement into psql, which then
runs essentially the same script again which was created in a previous step (also with --clean)

Seeing my (stupid) mistake, I am not sure if this is supposed to work or not...

Thomas

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Kellerer (#5)
Re: BUG #15000: Cache lookup failure

Thomas Kellerer <thomas@kellerer.eu> writes:

There was a left over pipe symbol in the script that we didn't see:

pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
psql --single-transaction --dbname=${db_prod} --file=${backup_stag}

So pg_dump pipes a script with DROP statement into psql, which then
runs essentially the same script again which was created in a previous step (also with --clean)

Hm, but if psql is restoring into a different database than pg_dump
is dumping from, I don't see how they'd interact?

regards, tom lane

#7Thomas Kellerer
thomas@kellerer.eu
In reply to: Tom Lane (#6)
Re: BUG #15000: Cache lookup failure

Tom Lane schrieb am 08.01.2018 um 17:04:

There was a left over pipe symbol in the script that we didn't see:

pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
psql --single-transaction --dbname=${db_prod} --file=${backup_stag}

So pg_dump pipes a script with DROP statement into psql, which then
runs essentially the same script again which was created in a previous step (also with --clean)

Hm, but if psql is restoring into a different database than pg_dump
is dumping from, I don't see how they'd interact?

Yes, put psql is running the restore script (created with --clean) *twice*:
once because of the pipe and a second time because of the --file argument