cache lookup failed when \d t concurrent with DML change column data type

Started by jian heabout 1 year ago5 messages
#1jian he
jian.universality@gmail.com

hi. I think I found a bug.
PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
gcc-14.1.0, 64-bit
commit at 45188c2ea2.
Ubuntu 22.04.4 LTS

setup:
drop table t cascade;
create table t(a int PRIMARY key);

IN session1:
step "change data type" {begin; alter table t alter column a set data
type int4;}
step "s1" {commit;}

IN session2:
step "psql_another_session" {\d t}

permutation "change data type" "psql_another_session" "s1"

\set ECHO_HIDDEN on
/******** QUERY *********/
SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered,
i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
pg_catalog.pg_get_constraintdef(con.oid, true), contype,
condeferrable, condeferred, i.indisreplident, c2.reltablespace,
con.conperiod
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND
conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.oid = '34405' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, c2.relname;
/************************/

ERROR: cache lookup failed for attribute 1 of relation 34418

#2Andrei Lepikhov
lepihov@gmail.com
In reply to: jian he (#1)
Re: cache lookup failed when \d t concurrent with DML change column data type

On 10/24/24 22:30, jian he wrote:

hi. I think I found a bug.
PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
gcc-14.1.0, 64-bit
commit at 45188c2ea2.
Ubuntu 22.04.4 LTS

setup:
drop table t cascade;
create table t(a int PRIMARY key);

IN session1:
step "change data type" {begin; alter table t alter column a set data
type int4;}
step "s1" {commit;}

IN session2:
step "psql_another_session" {\d t}

permutation "change data type" "psql_another_session" "s1"

ERROR: cache lookup failed for attribute 1 of relation 34418

Yes, it looks like a bug existing for a long time, at least since PG11
(I didn't trace further down).
It seems that the backend didn't apply invalidation messages before
touching system caches. Backtrace:

in get_attoptions (relid=16388, attnum=1) at lsyscache.c:982
in pg_get_indexdef_worker (indexrelid=16388, colno=0, excludeOps=0x0,
attrsOnly=false, keysOnly=false, showTblSpc=false, inherits=false,
prettyFlags=7, missing_ok=true) at ruleutils.c:1458
in pg_get_indexdef_ext (fcinfo=0x55a15acc1c18) at ruleutils.c:1202
in ExecInterpExpr (state=0x55a15acc2a10, econtext=0x55a15ac62930,
isnull=0x7fffd66a5bcf) at execExprInterp.c:770
in ExecInterpExprStillValid (state=0x55a15acc2a10,
econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at execExprInterp.c:2035
in ExecEvalExprSwitchContext (state=0x55a15acc2a10,
econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at
../../../src/include/executor/executor.h:367
in ExecProject (projInfo=0x55a15acc2a08) at
../../../src/include/executor/executor.h:401

--
regards, Andrei Lepikhov

#3Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#2)
Re: cache lookup failed when \d t concurrent with DML change column data type

On 10/25/24 10:05, Andrei Lepikhov wrote:

On 10/24/24 22:30, jian he wrote:

hi. I think I found a bug.
PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
gcc-14.1.0, 64-bit
commit at 45188c2ea2.
Ubuntu 22.04.4 LTS

setup:
drop table t cascade;
create table t(a int PRIMARY key);

IN session1:
step "change data type" {begin; alter table t alter column a set data
type int4;}
step "s1" {commit;}

IN session2:
step "psql_another_session" {\d t}

permutation "change data type" "psql_another_session" "s1"

ERROR:  cache lookup failed for attribute 1 of relation 34418

Yes, it looks like a bug existing for a long time, at least since PG11
(I didn't trace further down).
It seems that the backend didn't apply invalidation messages before
touching system caches. Backtrace:

After a short discovery, I found the origins:
The pg_get_indexdef has an incoming index oid and gets all the stuff
needed just by looking up sys-caches. But it wants to build a list of
relation column names at a specific moment and opens the heap relation.
After that operation, we already have syscaches updated and the old
index oid replaced with the new one.
It may be have made sense to lock the row of replaced index in pg_class
and pg_index until the transaction, altered it will be commmitted. But,
because ALTER TABLE is not fully MVCC-safe, it may be expected (or
acceptable) behaviour.

--
regards, Andrei Lepikhov

#4Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrei Lepikhov (#3)
Re: cache lookup failed when \d t concurrent with DML change column data type

On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

It may be have made sense to lock the row of replaced index in pg_class
and pg_index until the transaction, altered it will be commmitted. But,
because ALTER TABLE is not fully MVCC-safe, it may be expected (or
acceptable) behaviour.

I suspect this is the case. If that is, should be reflect it in the doc?

--
Best regards,
Kirill Reshke

#5Andrei Lepikhov
lepihov@gmail.com
In reply to: Kirill Reshke (#4)
Re: cache lookup failed when \d t concurrent with DML change column data type

On 10/25/24 14:15, Kirill Reshke wrote:

On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote:

It may be have made sense to lock the row of replaced index in pg_class
and pg_index until the transaction, altered it will be commmitted. But,
because ALTER TABLE is not fully MVCC-safe, it may be expected (or
acceptable) behaviour.

I suspect this is the case. If that is, should be reflect it in the doc?

We already have the doc entry on such cases [1]https://www.postgresql.org/docs/16/mvcc-caveats.html. This is a suitable
place to change if someone wants to detail this 'failed cache lookup' case.

[1]: https://www.postgresql.org/docs/16/mvcc-caveats.html

--
regards, Andrei Lepikhov