BUG #18244: Corruption in indexes involving whole-row expressions

Started by PG Bug reporting formover 2 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18244
Logged by: Nikolay Samokhvalov
Email address: nik@postgres.ai
PostgreSQL version: 16.1
Operating system: any
Description:

nik=# create index on t1 (hash_record(t1));
CREATE INDEX

Such an index can easily be corrupted, e.g.:

nik=# alter table t1 add column yo int default -1;
ALTER TABLE

Or if we just drop a column:

nik=# alter table t_n drop column c2;
ALTER TABLE
nik=# \d t_n
Table "public.t_n"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
Indexes:
"t_n_hash_record_idx" btree (hash_record(t_n.*))

nik=# select * from t_n where hash_record(t_n) = hash_record(row(1, -1));
c1
----
1
(1 row)

nik=# select * from t_n where hash_record(t_n) = hash_record(row(1));
c1
----
(0 rows)

Or if index is unique:

nik=# select * from t_n;
c4
----
-1
-1
-1
(3 rows)

nik=# \d t_n
Table "public.t_n"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------------
c4 | integer | | | '-1'::integer
Indexes:
"t_n_hash_record_idx" UNIQUE, btree (hash_record(t_n.*))

nik=# reindex index t_n_hash_record_idx;
ERROR: could not create unique index "t_n_hash_record_idx"
DETAIL: Key (hash_record(t_n.*))=(385747274) is duplicated.

Proposal: prohibit the use of whole-row expression – as it is already done
for generated columns and produce a similar error ("cannot use whole-row
variable in column generation expression")

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: PG Bug reporting form (#1)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote:

nik=# create index on t1 (hash_record(t1));

Such an index can easily be corrupted, e.g.:

nik=# alter table t1 add column yo int default -1;

Proposal: prohibit the use of whole-row expression – as it is already done
for generated columns and produce a similar error ("cannot use whole-row
variable in column generation expression")

I reported that bug before:
/messages/by-id/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel@cybertec.at

I think that your proposal is good, it seems that nobody cared enough
to implement it. If we forbid that, one thing to consider is pg_upgrade:
it needs a new check.

Yours,
Laurenz Albe

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Laurenz Albe (#2)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Wed, Dec 13, 2023 at 11:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote:

nik=# create index on t1 (hash_record(t1));

Such an index can easily be corrupted, e.g.:

nik=# alter table t1 add column yo int default -1;

Proposal: prohibit the use of whole-row expression – as it is already done
for generated columns and produce a similar error ("cannot use whole-row
variable in column generation expression")

I reported that bug before:
/messages/by-id/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel@cybertec.at

FTR I also posted a repro for another variation of that problem. I
think it's slightly more general that, it not just whole-row
expressions (eg index on <table_name>), it's row types generally:

/messages/by-id/CA+hUKGKb4SB+qQ-vAVomxAvJY6um+5URyq2D0vv10g7mbYZ1Ww@mail.gmail.com

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Thomas Munro (#3)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Wed, 2023-12-13 at 14:38 +1300, Thomas Munro wrote:

On Wed, Dec 13, 2023 at 11:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote:

nik=# create index on t1 (hash_record(t1));

Such an index can easily be corrupted, e.g.:

nik=# alter table t1 add column yo int default -1;

Proposal: prohibit the use of whole-row expression – as it is already done
for generated columns and produce a similar error ("cannot use whole-row
variable in column generation expression")

I reported that bug before:
/messages/by-id/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel@cybertec.at

FTR I also posted a repro for another variation of that problem. I
think it's slightly more general that, it not just whole-row
expressions (eg index on <table_name>), it's row types generally:

/messages/by-id/CA+hUKGKb4SB+qQ-vAVomxAvJY6um+5URyq2D0vv10g7mbYZ1Ww@mail.gmail.com

Yes, and that makes it difficult to decide on the correct path.

Should we check all places where the type is used before allowing
ALTER TYPE? Sounds complicated and error-prone.

Should we forbid composite types in index declarations? Sounds posssible,
but very restrictive. We might as well forbid composite types in table
definitions. That would be a good thing in my opinion, but it would cause
a compatibility headache.

Perhaps the best thing would be to add a warning to chapter 8.16.4 that
it is "best practice" (and conforming to the first normal form) not to
use composite types in column definitions, and that altering a composite
type used in a table definition can lead to consistency problems.

Yours,
Laurenz Albe

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#4)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

Laurenz Albe <laurenz.albe@cybertec.at> writes:

Should we forbid composite types in index declarations? Sounds posssible,
but very restrictive.

That would make a lot of people very sad, I fear.

I think a minimum answer could be to document that you might need to
REINDEX such indexes after a change in the composite type. We could
perhaps try to be more proactive, say by marking such indexes invalid.
I think though that that would have deadlock problems as well as race
conditions. Perhaps best to leave it at "reindex when necessary",
especially in view of the tiny number of reports to date.

regards, tom lane

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#5)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Tue, 2023-12-12 at 23:04 -0500, Tom Lane wrote:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

Should we forbid composite types in index declarations? Sounds posssible,
but very restrictive.

That would make a lot of people very sad, I fear.

I think a minimum answer could be to document that you might need to
REINDEX such indexes after a change in the composite type. We could
perhaps try to be more proactive, say by marking such indexes invalid.
I think though that that would have deadlock problems as well as race
conditions. Perhaps best to leave it at "reindex when necessary",
especially in view of the tiny number of reports to date.

I agree. How about the attached documentation patch?

Yours,
Laurenz Albe

Attachments:

0001-Document-the-dangers-of-composite-type-columns.patchtext/x-patch; charset=UTF-8; name=0001-Document-the-dangers-of-composite-type-columns.patchDownload+10-1
#7Junwang Zhao
zhjwpku@gmail.com
In reply to: Laurenz Albe (#6)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Thu, Dec 14, 2023 at 1:43 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2023-12-12 at 23:04 -0500, Tom Lane wrote:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

Should we forbid composite types in index declarations? Sounds posssible,
but very restrictive.

That would make a lot of people very sad, I fear.

I think a minimum answer could be to document that you might need to
REINDEX such indexes after a change in the composite type. We could
perhaps try to be more proactive, say by marking such indexes invalid.
I think though that that would have deadlock problems as well as race
conditions. Perhaps best to leave it at "reindex when necessary",
especially in view of the tiny number of reports to date.

I agree. How about the attached documentation patch?

+   While you can use composite types in column definitions, there is usually no
+   benefit in doing so.

Is this true? If yes, we'd better update this to
https://wiki.postgresql.org/wiki/Don%27t_Do_This.

Yours,
Laurenz Albe

--
Regards
Junwang Zhao

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Junwang Zhao (#7)
Re: BUG #18244: Corruption in indexes involving whole-row expressions

On Thu, 2023-12-14 at 16:27 +0800, Junwang Zhao wrote:

+   While you can use composite types in column definitions, there is usually no
+   benefit in doing so.

Is this true? If yes, we'd better update this to
https://wiki.postgresql.org/wiki/Don%27t_Do_This.

I cannot think of a use case for composite type columns that wouldn't be
better served by directly declaring the columns of the composite key
in the table definition. And it makes accessing the values more complicated.

So yes, I'd be on board with a "don't do this" entry.

Yours,
Laurenz Albe