ATTACH PARTITION seems to ignore column generation status

Started by Tom Laneover 3 years ago14 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

This does not seem good:

regression=# create table pp (a int, b int) partition by range(a);
CREATE TABLE
regression=# create table cc (a int generated always as (b+1) stored, b int);
CREATE TABLE
regression=# alter table pp attach partition cc for values from ('1') to ('10');
ALTER TABLE
regression=# insert into pp values(1,100);
INSERT 0 1
regression=# table pp;
a | b
-----+-----
101 | 100
(1 row)

I'm not sure to what extent it's sensible for partitions to have
GENERATED columns that don't match their parent; but even if that's
okay for payload columns I doubt we want to allow partitioning
columns to be GENERATED.

regards, tom lane

#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#1)
Re: ATTACH PARTITION seems to ignore column generation status

On Fri, Jan 6, 2023 at 3:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This does not seem good:

regression=# create table pp (a int, b int) partition by range(a);
CREATE TABLE
regression=# create table cc (a int generated always as (b+1) stored, b int);
CREATE TABLE
regression=# alter table pp attach partition cc for values from ('1') to ('10');
ALTER TABLE
regression=# insert into pp values(1,100);
INSERT 0 1
regression=# table pp;
a | b
-----+-----
101 | 100
(1 row)

This indeed is broken and seems like an oversight. :-(

I'm not sure to what extent it's sensible for partitions to have
GENERATED columns that don't match their parent; but even if that's
okay for payload columns I doubt we want to allow partitioning
columns to be GENERATED.

Actually, I'm inclined to disallow partitions to have *any* generated
columns that are not present in the parent table. The main reason for
that is the inconvenience of checking that a partition's generated
columns doesn't override the partition key column of an ancestor that
is not its immediate parent, which MergeAttributesIntoExisting() has
access to and would have been locked.

Patch doing it that way is attached. Perhaps the newly added error
message should match CREATE TABLE .. PARTITION OF's, but I found the
latter to be not detailed enough, or maybe that's just me.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

disallow-partition-only-generated-cols.patchapplication/octet-stream; name=disallow-partition-only-generated-cols.patchDownload+20-2
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#2)
Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote <amitlangote09@gmail.com> writes:

On Fri, Jan 6, 2023 at 3:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not sure to what extent it's sensible for partitions to have
GENERATED columns that don't match their parent; but even if that's
okay for payload columns I doubt we want to allow partitioning
columns to be GENERATED.

Actually, I'm inclined to disallow partitions to have *any* generated
columns that are not present in the parent table. The main reason for
that is the inconvenience of checking that a partition's generated
columns doesn't override the partition key column of an ancestor that
is not its immediate parent, which MergeAttributesIntoExisting() has
access to and would have been locked.

After thinking about this awhile, I feel that we ought to disallow
it in the traditional-inheritance case as well. The reason is that
there are semantic prohibitions on inserting or updating a generated
column, eg

regression=# create table t (f1 int, f2 int generated always as (f1+1) stored);
CREATE TABLE
regression=# update t set f2=42;
ERROR: column "f2" can only be updated to DEFAULT
DETAIL: Column "f2" is a generated column.

It's not very reasonable to have to recheck that for child tables,
and we don't. But if one does this:

regression=# create table pp (f1 int, f2 int);
CREATE TABLE
regression=# create table cc (f1 int, f2 int generated always as (f1+1) stored) inherits(pp);
NOTICE: merging column "f1" with inherited definition
NOTICE: merging column "f2" with inherited definition
CREATE TABLE
regression=# insert into cc values(1);
INSERT 0 1
regression=# update pp set f2 = 99 where f1 = 1;
UPDATE 1
regression=# table cc;
f1 | f2
----+----
1 | 99
(1 row)

That is surely just as broken as the partition-based case.

I also note that the code adjacent to what you added is

/*
* If parent column is generated, child column must be, too.
*/
if (attribute->attgenerated && !childatt->attgenerated)
ereport(ERROR, ...

without any exception for non-partition inheritance, and the
following check for equivalent generation expressions has
no such exception either. So it's not very clear why this
test should have an exception.

Patch doing it that way is attached. Perhaps the newly added error
message should match CREATE TABLE .. PARTITION OF's, but I found the
latter to be not detailed enough, or maybe that's just me.

Maybe we should improve the existing error message while we're at it?

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: ATTACH PARTITION seems to ignore column generation status

I wrote:

After thinking about this awhile, I feel that we ought to disallow
it in the traditional-inheritance case as well. The reason is that
there are semantic prohibitions on inserting or updating a generated
column, eg

regression=# create table t (f1 int, f2 int generated always as (f1+1) stored);
CREATE TABLE
regression=# update t set f2=42;
ERROR: column "f2" can only be updated to DEFAULT
DETAIL: Column "f2" is a generated column.

It's not very reasonable to have to recheck that for child tables,
and we don't. But if one does this:

regression=# create table pp (f1 int, f2 int);
CREATE TABLE
regression=# create table cc (f1 int, f2 int generated always as (f1+1) stored) inherits(pp);
NOTICE: merging column "f1" with inherited definition
NOTICE: merging column "f2" with inherited definition
CREATE TABLE
regression=# insert into cc values(1);
INSERT 0 1
regression=# update pp set f2 = 99 where f1 = 1;
UPDATE 1
regression=# table cc;
f1 | f2
----+----
1 | 99
(1 row)

That is surely just as broken as the partition-based case.

So what we need is about like this. This is definitely not something
to back-patch, since it's taking away what had been a documented
behavior. You could imagine trying to prevent such UPDATEs instead,
but I judge it not worth the trouble. If anyone were actually using
this capability we'd have heard bug reports.

regards, tom lane

Attachments:

disallow-generated-child-columns-2.patchtext/x-diff; charset=us-ascii; name=disallow-generated-child-columns-2.patchDownload+38-47
#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#4)
Re: ATTACH PARTITION seems to ignore column generation status

On Tue, Jan 10, 2023 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

After thinking about this awhile, I feel that we ought to disallow
it in the traditional-inheritance case as well. The reason is that
there are semantic prohibitions on inserting or updating a generated
column, eg

regression=# create table t (f1 int, f2 int generated always as (f1+1) stored);
CREATE TABLE
regression=# update t set f2=42;
ERROR: column "f2" can only be updated to DEFAULT
DETAIL: Column "f2" is a generated column.

It's not very reasonable to have to recheck that for child tables,
and we don't. But if one does this:

regression=# create table pp (f1 int, f2 int);
CREATE TABLE
regression=# create table cc (f1 int, f2 int generated always as (f1+1) stored) inherits(pp);
NOTICE: merging column "f1" with inherited definition
NOTICE: merging column "f2" with inherited definition
CREATE TABLE
regression=# insert into cc values(1);
INSERT 0 1
regression=# update pp set f2 = 99 where f1 = 1;
UPDATE 1
regression=# table cc;
f1 | f2
----+----
1 | 99
(1 row)

That is surely just as broken as the partition-based case.

Agree that it looks bad.

So what we need is about like this. This is definitely not something
to back-patch, since it's taking away what had been a documented
behavior. You could imagine trying to prevent such UPDATEs instead,
but I judge it not worth the trouble. If anyone were actually using
this capability we'd have heard bug reports.

Thanks for the patch. It looks good, though I guess you said that we
should also change the error message that CREATE TABLE ... PARTITION
OF emits to match the other cases while we're here. I've attached a
delta patch.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

partition-generated-cols-different-error.patchapplication/octet-stream; name=partition-generated-cols-different-error.patchDownload+13-6
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#5)
Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote <amitlangote09@gmail.com> writes:

Thanks for the patch. It looks good, though I guess you said that we
should also change the error message that CREATE TABLE ... PARTITION
OF emits to match the other cases while we're here. I've attached a
delta patch.

Thanks. I hadn't touched that issue because I wasn't entirely sure
which error message(s) you were unhappy with. These changes look
fine offhand.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: ATTACH PARTITION seems to ignore column generation status

I wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Thanks for the patch. It looks good, though I guess you said that we
should also change the error message that CREATE TABLE ... PARTITION
OF emits to match the other cases while we're here. I've attached a
delta patch.

Thanks. I hadn't touched that issue because I wasn't entirely sure
which error message(s) you were unhappy with. These changes look
fine offhand.

So, after playing with that a bit ... removing the block in
parse_utilcmd.c allows you to do

regression=# CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint GENERATED ALWAYS AS (f2 * 2) STORED) PARTITION BY RANGE (f1);
CREATE TABLE
regression=# CREATE TABLE gtest_child PARTITION OF gtest_parent (
regression(# f3 WITH OPTIONS GENERATED ALWAYS AS (f2 * 3) STORED
regression(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
regression=# \d gtest_child
Table "public.gtest_child"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+-------------------------------------
f1 | date | | not null |
f2 | bigint | | |
f3 | bigint | | | generated always as (f2 * 3) stored
Partition of: gtest_parent FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')

regression=# insert into gtest_parent values('2016-07-01', 42);
INSERT 0 1
regression=# table gtest_parent;
f1 | f2 | f3
------------+----+-----
2016-07-01 | 42 | 126
(1 row)

That is, you can make a partition with a different generated expression
than the parent has. Do we really want to allow that? I think it works
as far as the backend is concerned, but it breaks pg_dump, which tries
to dump this state of affairs as

CREATE TABLE public.gtest_parent (
f1 date NOT NULL,
f2 bigint,
f3 bigint GENERATED ALWAYS AS ((f2 * 2)) STORED
)
PARTITION BY RANGE (f1);

CREATE TABLE public.gtest_child (
f1 date NOT NULL,
f2 bigint,
f3 bigint GENERATED ALWAYS AS ((f2 * 3)) STORED
);

ALTER TABLE ONLY public.gtest_parent ATTACH PARTITION public.gtest_child FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');

and that fails at reload because the ATTACH PARTITION code path
checks for equivalence of the generation expressions.

This different-generated-expression situation isn't really morally
different from different ordinary DEFAULT expressions, which we
do endeavor to support. So maybe we should deem this a supported
case and remove ATTACH PARTITION's insistence that the generation
expressions match ... which I think would be a good thing anyway,
because that check-for-same-reverse-compiled-expression business
is pretty cheesy in itself. AFAIK, 3f7836ff6 took care of the
only problem that the backend would have with this, and pg_dump
looks like it will work as long as the backend will take the
ATTACH command.

I also looked into making CREATE TABLE ... PARTITION OF reject
this case; but that's much harder than it sounds, because what we
have at the relevant point is a raw (unanalyzed) expression for
the child's generation expression but a cooked one for the
parent's, so there is no easy way to match the two.

In short, it's seeming like the rule for both partitioning and
traditional inheritance ought to be "a child column must have
the same generated-or-not property as its parent, but their
generation expressions need not be the same". Thoughts?

regards, tom lane

#8Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#7)
Re: ATTACH PARTITION seems to ignore column generation status

On Wed, Jan 11, 2023 at 7:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Thanks for the patch. It looks good, though I guess you said that we
should also change the error message that CREATE TABLE ... PARTITION
OF emits to match the other cases while we're here. I've attached a
delta patch.

Thanks. I hadn't touched that issue because I wasn't entirely sure
which error message(s) you were unhappy with. These changes look
fine offhand.

So, after playing with that a bit ... removing the block in
parse_utilcmd.c allows you to do

regression=# CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint GENERATED ALWAYS AS (f2 * 2) STORED) PARTITION BY RANGE (f1);
CREATE TABLE
regression=# CREATE TABLE gtest_child PARTITION OF gtest_parent (
regression(# f3 WITH OPTIONS GENERATED ALWAYS AS (f2 * 3) STORED
regression(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
regression=# \d gtest_child
Table "public.gtest_child"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+-------------------------------------
f1 | date | | not null |
f2 | bigint | | |
f3 | bigint | | | generated always as (f2 * 3) stored
Partition of: gtest_parent FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')

regression=# insert into gtest_parent values('2016-07-01', 42);
INSERT 0 1
regression=# table gtest_parent;
f1 | f2 | f3
------------+----+-----
2016-07-01 | 42 | 126
(1 row)

That is, you can make a partition with a different generated expression
than the parent has. Do we really want to allow that? I think it works
as far as the backend is concerned, but it breaks pg_dump, which tries
to dump this state of affairs as

CREATE TABLE public.gtest_parent (
f1 date NOT NULL,
f2 bigint,
f3 bigint GENERATED ALWAYS AS ((f2 * 2)) STORED
)
PARTITION BY RANGE (f1);

CREATE TABLE public.gtest_child (
f1 date NOT NULL,
f2 bigint,
f3 bigint GENERATED ALWAYS AS ((f2 * 3)) STORED
);

ALTER TABLE ONLY public.gtest_parent ATTACH PARTITION public.gtest_child FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');

and that fails at reload because the ATTACH PARTITION code path
checks for equivalence of the generation expressions.

This different-generated-expression situation isn't really morally
different from different ordinary DEFAULT expressions, which we
do endeavor to support.

Ah, right, we are a bit more flexible in allowing that. Though,
partition-specific defaults, unlike generated columns, are not
respected when inserting/updating via the parent:

create table partp (a int, b int generated always as (a+1) stored, c
int default 0) partition by list (a);
create table partc1 partition of partp (b generated always as (a+2)
stored, c default 1) for values in (1);
insert into partp values (1);
table partp;
a | b | c
---+---+---
1 | 3 | 0
(1 row)

create table partc2 partition of partp (b generated always as (a+2)
stored) for values in (2);
update partp set a = 2;
table partp;
a | b | c
---+---+---
2 | 4 | 0
(1 row)

So maybe we should deem this a supported
case and remove ATTACH PARTITION's insistence that the generation
expressions match

I tend to agree now that we have 3f7836ff6.

... which I think would be a good thing anyway,
because that check-for-same-reverse-compiled-expression business
is pretty cheesy in itself.

Hmm, yeah, we usually transpose a parent's expression into one that
has a child's attribute numbers and vice versa when checking their
equivalence.

AFAIK, 3f7836ff6 took care of the
only problem that the backend would have with this, and pg_dump
looks like it will work as long as the backend will take the
ATTACH command.

Right.

I also looked into making CREATE TABLE ... PARTITION OF reject
this case; but that's much harder than it sounds, because what we
have at the relevant point is a raw (unanalyzed) expression for
the child's generation expression but a cooked one for the
parent's, so there is no easy way to match the two.

In short, it's seeming like the rule for both partitioning and
traditional inheritance ought to be "a child column must have
the same generated-or-not property as its parent, but their
generation expressions need not be the same". Thoughts?

Agreed.

I've updated your disallow-generated-child-columns-2.patch to do this,
and have also merged the delta post that I had attached with my last
email, whose contents you sound to agree with.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

disallow-generated-child-columns-3.patchapplication/octet-stream; name=disallow-generated-child-columns-3.patchDownload+54-112
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#8)
Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote <amitlangote09@gmail.com> writes:

I've updated your disallow-generated-child-columns-2.patch to do this,
and have also merged the delta post that I had attached with my last
email, whose contents you sound to agree with.

Pushed with some further work to improve the handling of multiple-
inheritance cases. We still need to insist that all or none of the
parent columns are generated, but we don't have to require their
generation expressions to be alike: that can be resolved by letting
the child table override the expression, much as we've long done for
plain default expressions. (This did need some work in pg_dump
after all.) I'm pretty happy with where this turned out.

regards, tom lane

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#9)
Re: ATTACH PARTITION seems to ignore column generation status

On Thu, Jan 12, 2023 at 5:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

I've updated your disallow-generated-child-columns-2.patch to do this,
and have also merged the delta post that I had attached with my last
email, whose contents you sound to agree with.

Pushed with some further work to improve the handling of multiple-
inheritance cases. We still need to insist that all or none of the
parent columns are generated, but we don't have to require their
generation expressions to be alike: that can be resolved by letting
the child table override the expression, much as we've long done for
plain default expressions. (This did need some work in pg_dump
after all.) I'm pretty happy with where this turned out.

Thanks, that all looks more consistent now indeed.

I noticed a typo in the doc additions, which I've attached a fix for.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

8bf6ec3ba3-doc-typo.patchapplication/octet-stream; name=8bf6ec3ba3-doc-typo.patchDownload+1-1
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#10)
Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote <amitlangote09@gmail.com> writes:

I noticed a typo in the doc additions, which I've attached a fix for.

Doh, right, pushed.

regards, tom lane

#12Alexander Lakhin
exclusion@gmail.com
In reply to: Tom Lane (#9)
Re: ATTACH PARTITION seems to ignore column generation status

Hello,

11.01.2023 23:58, Tom Lane wrote:

Amit Langote<amitlangote09@gmail.com> writes:

I've updated your disallow-generated-child-columns-2.patch to do this,
and have also merged the delta post that I had attached with my last
email, whose contents you sound to agree with.

Pushed with some further work to improve the handling of multiple-
inheritance cases. We still need to insist that all or none of the
parent columns are generated, but we don't have to require their
generation expressions to be alike: that can be resolved by letting
the child table override the expression, much as we've long done for
plain default expressions. (This did need some work in pg_dump
after all.) I'm pretty happy with where this turned out.

I've encountered a query that triggers an assert added in that commit:
CREATE TABLE t(a int, b int GENERATED ALWAYS AS (a) STORED) PARTITION BY
RANGE (a);
CREATE TABLE tp PARTITION OF t(b DEFAULT 1) FOR VALUES FROM (0) to (1);

Core was generated by `postgres: law regression [local] CREATE
TABLE                                 '.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/3152655' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140460372887360) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140460372887360) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140460372887360) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140460372887360, signo=signo@entry=6)
at ./nptl/pthread_kill.c:89
#3  0x00007fbf79f0e476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007fbf79ef47f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055e76b35b322 in ExceptionalCondition (
    conditionName=conditionName@entry=0x55e76b4a2240
"!(coldef->generated && !restdef->generated)",
    fileName=fileName@entry=0x55e76b49ec71 "tablecmds.c",
lineNumber=lineNumber@entry=3028) at assert.c:66
#6  0x000055e76afef8c3 in MergeAttributes (schema=0x55e76d480318,
supers=supers@entry=0x55e76d474c18,
    relpersistence=112 'p', is_partition=true,
supconstr=supconstr@entry=0x7ffe945a3768) at tablecmds.c:3028
#7  0x000055e76aff0ef2 in DefineRelation
(stmt=stmt@entry=0x55e76d44b2d8, relkind=relkind@entry=114 'r', ownerId=10,
    ownerId@entry=0, typaddress=typaddress@entry=0x0,
    queryString=queryString@entry=0x55e76d44a408 "CREATE TABLE tp
PARTITION OF t(b DEFAULT 1) FOR VALUES FROM (0) to (1);") at tablecmds.c:861
...

Without asserts enables, the partition created successfully, and
INSERT INTO t VALUES(0);
SELECT * FROM t;
yields:
a | b
---+---
0 | 1
(1 row)

Is this behavior expected?

Best regards,
Alexander

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Lakhin (#12)
Re: ATTACH PARTITION seems to ignore column generation status

On 2023-Feb-16, Alexander Lakhin wrote:

I've encountered a query that triggers an assert added in that commit:
CREATE TABLE t(a int, b int GENERATED ALWAYS AS (a) STORED) PARTITION BY
RANGE (a);
CREATE TABLE tp PARTITION OF t(b DEFAULT 1) FOR VALUES FROM (0) to (1);

It seems wrong that this command is accepted. It should have given an
error, because the partition is not allowed to override the generation
of the value that is specified in the parent table.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#13)
Re: ATTACH PARTITION seems to ignore column generation status

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2023-Feb-16, Alexander Lakhin wrote:

I've encountered a query that triggers an assert added in that commit:
CREATE TABLE t(a int, b int GENERATED ALWAYS AS (a) STORED) PARTITION BY
RANGE (a);
CREATE TABLE tp PARTITION OF t(b DEFAULT 1) FOR VALUES FROM (0) to (1);

It seems wrong that this command is accepted. It should have given an
error, because the partition is not allowed to override the generation
of the value that is specified in the parent table.

Agreed. We missed a check somewhere, will fix.

regards, tom lane