\d t: ERROR: XX000: cache lookup failed for relation

Started by Justin Pryzbyover 7 years ago7 messages
#1Justin Pryzby
pryzby@telsasoft.com

Resending to -hackers
/messages/by-id/20180527022401.GA20949@telsasoft.com

Is that considered an actionable problem?

Encountered consistently while trying to reproduce the vacuum full
pg_statistic/toast_2619 bug; while running a loop around VAC FULL and more in
another session:

[1]: Running { time sh -ec 'while :; do psql --port 5678 postgres -qc "VACUUM FULL pg_toast.pg_toast_2619"; psql --port 5678 postgres -qc "VACUUM FULL pg_statistic"; done'; date; } &
[2]: + Running time while :; do psql postgres --port 5678 -c "INSERT INTO t SELECT i FROM generate_series(1,999999) i"; sleep 1; for a in `seq 999`; do psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE int USING i::int"; sleep 1; psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE bigint"; sleep 1; done; psql postgres --port 5678 -c "TRUNCATE t"; sleep 1; done &
psql postgres --port 5678 -c "INSERT INTO t SELECT i FROM generate_series(1,999999) i"; sleep 1; for a in `seq 999`;
do
psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE int USING i::int"; sleep 1; psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE bigint"; sleep 1;
done; psql postgres --port 5678 -c "TRUNCATE t"; sleep 1;
done &

$ psql --port 5678 postgres -x
psql (11beta1)
...
postgres=# \set VERBOSITY verbose
postgres=# \d t
ERROR: XX000: cache lookup failed for relation 8096742
LOCATION: flatten_reloptions, ruleutils.c:11065

Justin

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Justin Pryzby (#1)
1 attachment(s)
Re: \d t: ERROR: XX000: cache lookup failed for relation

Is that considered an actionable problem?

I think so. but I'm not able to reproduce that, I wrote a script to simplify but
it doesn't reproduce too.

And how long to wait to reproduce? I waited for one hour

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

1.shapplication/x-shellscript; name=1.shDownload
#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Teodor Sigaev (#2)
Re: \d t: ERROR: XX000: cache lookup failed for relation

On Mon, Jun 04, 2018 at 07:12:53PM +0300, Teodor Sigaev wrote:

Is that considered an actionable problem?

I think so. but I'm not able to reproduce that, I wrote a script to simplify

The failure is triggered by running "\d t" in (yet) another session - sorry if
that was unclear. It fails very consistently, probably over 75% of the time.

Also note that my "INSERT" was run in a separate loop, concurrent with the
VACUUM and ALTER, but yours is running consecutively.

Justin

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Justin Pryzby (#3)
Re: \d t: ERROR: XX000: cache lookup failed for relation

The failure is triggered by running "\d t" in (yet) another session - sorry if
that was unclear. It fails very consistently, probably over 75% of the time.

No-no, I understood that. I tried \d in one more session.

Also note that my "INSERT" was run in a separate loop, concurrent with the
VACUUM and ALTER, but yours is running consecutively.

both loops run in backgound. I tried to run two scripts - and got a lot of
deadlocks but not a probem reproduction.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Teodor Sigaev (#4)
Re: \d t: ERROR: XX000: cache lookup failed for relation

On Mon, Jun 04, 2018 at 08:01:41PM +0300, Teodor Sigaev wrote:

Also note that my "INSERT" was run in a separate loop, concurrent with the
VACUUM and ALTER, but yours is running consecutively.

both loops run in backgound. I tried to run two scripts - and got a lot of
deadlocks but not a probem reproduction.

Ah, I think this is the missing, essential component:
CREATE INDEX ON t(right(i::text,1)) WHERE i::text LIKE '%1';

I can reproduce it running just this loop:

time while :; do for a in `seq 999`; do psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE int USING i::int"; done; done

Justin

#6Teodor Sigaev
teodor@sigaev.ru
In reply to: Justin Pryzby (#5)
1 attachment(s)
Re: \d t: ERROR: XX000: cache lookup failed for relation

Ah, I think this is the missing, essential component:
CREATE INDEX ON t(right(i::text,1)) WHERE i::text LIKE '%1';

Finally, I reproduce it with attached script.

INSERT 0 999999 <- first insertion
ERROR: cache lookup failed for relation 1032219
ALTER TABLE
ERROR: cache lookup failed for relation 1033478
ALTER TABLE
ERROR: cache lookup failed for relation 1034073
ALTER TABLE
ERROR: cache lookup failed for relation 1034650
ALTER TABLE
ERROR: cache lookup failed for relation 1035238
ALTER TABLE
ERROR: cache lookup failed for relation 1035837

will investigate
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

1.shapplication/x-shellscript; name=1.shDownload
#7Teodor Sigaev
teodor@sigaev.ru
In reply to: Teodor Sigaev (#6)
1 attachment(s)
Re: \d t: ERROR: XX000: cache lookup failed for relation

Teodor Sigaev wrote:

Ah, I think this is the missing, essential component:
CREATE INDEX ON t(right(i::text,1)) WHERE i::text LIKE '%1';

Finally, I reproduce it with attached script.

In attachment simplified version of script. psql uses ordinary sql query
to get info about index with usual transaction isolation/MVCC. To create
a description of index it calls pg_get_indexdef() which doesn't use
transaction snapshot, it uses catalog snapshot because it accesses to
catalog through system catalog cache. So the difference is used snapshot
between ordinary SQL query and pg_get_indexdef(). I'm not sure that
easy to fix and should it be fixed at all.

Simplified query:
SELECT c2.relname, i.indexrelid,
pg_catalog.pg_get_indexdef(i.indexrelid, 0, true)
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2,
pg_catalog.pg_index i
WHERE c.relname = 't' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

1.shapplication/x-shellscript; name=1.shDownload