bug when apply fast default mechanism for adding new column over domain with default value
hi.
While looking at ATExecAddColumn, I saw there are two DomainHasConstraints,
which are cache unfriendly,
then I think I found a bug for applying fast default over domain with
default value.
bug demo:
create domain int_domain3 as int check(value > 1) default 11;
create domain int_domain4 as int default 11;
drop table if exists t0;
create table t0(a int);
insert into t0 values(1),(2);
alter table t0 add column b int_domain3;
select * from t0;
a | b
---+----
1 | 11
2 | 11
(2 rows)
alter table t0 add column c int_domain4;
select * from t0;
a | b | c
---+----+---
1 | 11 |
2 | 11 |
(2 rows)
I expect column "c" value also be value 11.
column b type is domain int_domain3, which has a constraint.
As a result, adding column b triggers a table rewrite, ensuring the
domain default value is successfully applied.
column c is domain int_domain4, which has no constraint.
This allows for a fast default mechanism applied to column c, but we cannot.
StoreAttrDefault, pg_attrdef can not cope with domain with default expression.
also domain default expression is stored at pg_tye.typdefaultbin.
Attach a patch to fix this issue by cause it to table rewrite.
also minor refactor to avoid double DomainHasConstraints function call.
jian he <jian.universality@gmail.com> writes:
then I think I found a bug for applying fast default over domain with
default value.
Yeah, that's definitely a bug: it worked correctly in versions
before we invented the attmissingval mechanism.
column b type is domain int_domain3, which has a constraint.
As a result, adding column b triggers a table rewrite, ensuring the
domain default value is successfully applied.
I do not believe that case should require a table rewrite.
Both the default and the check constraint are immutable,
so we ought to be able to apply the check once and then
use the default as the attmissingval.
Attach a patch to fix this issue by cause it to table rewrite.
I see no attached patch, but in any case forcing a table rewrite
seems like the wrong direction...
regards, tom lane
On Sat, Mar 1, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I do not believe that case should require a table rewrite.
Both the default and the check constraint are immutable,
so we ought to be able to apply the check once and then
use the default as the attmissingval.Attach a patch to fix this issue by cause it to table rewrite.
I see no attached patch, but in any case forcing a table rewrite
seems like the wrong direction...
forcing table rewrite would be an easier fix.
but not forcing table write seems doable.
\d pg_attrdef
Table "pg_catalog.pg_attrdef"
Column | Type | Collation | Nullable | Default
---------+--------------+-----------+----------+---------
oid | oid | | not null |
adrelid | oid | | not null |
adnum | smallint | | not null |
adbin | pg_node_tree | C | not null |
Indexes:
"pg_attrdef_oid_index" PRIMARY KEY, btree (oid)
"pg_attrdef_adrelid_adnum_index" UNIQUE CONSTRAINT, btree (adrelid, adnum)
pg_attrdef_adrelid_adnum_index means
a column can only have one default expression adbin.
if we store domain's default expression in pg_attrdef it may have error like:
CREATE DOMAIN int_domain AS int DEFAULT 11;
ALTER TABLE t2 ADD COLUMN b int_domain default 3;
ERROR: duplicate key value violates unique constraint
"pg_attrdef_adrelid_adnum_index"
DETAIL: Key (adrelid, adnum)=(18102, 2) already exists.
currently function StoreAttrDefault will
1. set pg_attribute.atthasdef
2. compute and set atthasmissing, attmissingval
3. insert an entry in pg_attrdef.
but we only want 2.
So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef,
pg_attrdef related code.
and it works fine.
Attachments:
v1-0001-apply-fast-default-for-adding-new-column-over-dom.patchtext/x-patch; charset=US-ASCII; name=v1-0001-apply-fast-default-for-adding-new-column-over-dom.patchDownload+220-3
jian he <jian.universality@gmail.com> writes:
So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef,
pg_attrdef related code.
and it works fine.
I think the fundamental problem here is that StoreAttrDefault
shouldn't be involved in this in the first place. It looks like
somebody did that in the hopes of avoiding two updates of the
new pg_attribute row (one to set atthasdef and the other to fill
attmissingval), but it's just bad code structure. We should
take that code out of StoreAttrDefault, not duplicate it, because
the assumption that the missingval is identical to what goes into
pg_attrdef is just wrong.
(We could possibly get back down to one update by moving code
in ATExecAddColumn so that we know before calling
InsertPgAttributeTuples whether the column will have a non-null
default, and then we could set atthasdef correctly in the
originally-inserted tuple. That's an optimization though,
not part of the basic bug fix.)
Also, since the "if (RELKIND_HAS_STORAGE(relkind))" stanza
in ATExecAddColumn is already calling expression_planner,
we should be able to avoid doing that twice on the way to
discovering whether the expression is a constant.
I kind of feel that StoreAttrMissingVal belongs in heap.c;
it's got about nothing to do with pg_attrdef.
regards, tom lane
On Sun, Mar 2, 2025 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
jian he <jian.universality@gmail.com> writes:
So I duplicated StoreAttrDefault and removed pg_attribute.atthasdef,
pg_attrdef related code.
and it works fine.I think the fundamental problem here is that StoreAttrDefault
shouldn't be involved in this in the first place. It looks like
somebody did that in the hopes of avoiding two updates of the
new pg_attribute row (one to set atthasdef and the other to fill
attmissingval), but it's just bad code structure. We should
take that code out of StoreAttrDefault, not duplicate it, because
the assumption that the missingval is identical to what goes into
pg_attrdef is just wrong.
in StoreAttrDefault,
i've split missingval related code into StoreAttrMissingVal.
Also, since the "if (RELKIND_HAS_STORAGE(relkind))" stanza
in ATExecAddColumn is already calling expression_planner,
we should be able to avoid doing that twice on the way to
discovering whether the expression is a constant.
done.
I kind of feel that StoreAttrMissingVal belongs in heap.c;
it's got about nothing to do with pg_attrdef.
heap.c seems fine.
Attachments:
v2-0001-apply-fast-default-for-adding-new-column-over-dom.patchtext/x-patch; charset=US-ASCII; name=v2-0001-apply-fast-default-for-adding-new-column-over-dom.patchDownload+225-83
jian he <jian.universality@gmail.com> writes:
On Sun, Mar 2, 2025 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the fundamental problem here is that StoreAttrDefault
shouldn't be involved in this in the first place.
i've split missingval related code into StoreAttrMissingVal.
Hm, this does nothing to improve the modularity of the affected
code; if anything it's worse than before. (Fools around for
awhile...) I had something more like the attached in mind.
The idea here is to centralize the control of whether we are
storing a missingval or doing a table rewrite in ATExecAddColumn.
StoreAttrMissingVal has nothing to do with pg_attrdef nor does
StoreAttrDefault have anything to do with attmissingval.
I looked briefly at determining the presence of a default
earlier so we could avoid the extra update of pg_attribute
when both those changes need to happen, but I concluded that
it'd take more refactoring than it's worth. The problem is
the logic way down inside AddRelationNewConstraints that
checks for the eventually-cooked default expression just
being a null constant.
regards, tom lane
Attachments:
v3-0001-Fix-broken-handling-of-domains-in-atthasmissing-l.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Fix-broken-handling-of-domains-in-atthasmissing-l.p; name*1=atchDownload+242-35
v3-0002-Remove-now-dead-code-in-StoreAttrDefault.patchtext/x-diff; charset=us-ascii; name=v3-0002-Remove-now-dead-code-in-StoreAttrDefault.patchDownload+12-85
On Mon, Mar 3, 2025 at 6:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
jian he <jian.universality@gmail.com> writes:
On Sun, Mar 2, 2025 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the fundamental problem here is that StoreAttrDefault
shouldn't be involved in this in the first place.i've split missingval related code into StoreAttrMissingVal.
Hm, this does nothing to improve the modularity of the affected
code; if anything it's worse than before. (Fools around for
awhile...) I had something more like the attached in mind.
The idea here is to centralize the control of whether we are
storing a missingval or doing a table rewrite in ATExecAddColumn.
StoreAttrMissingVal has nothing to do with pg_attrdef nor does
StoreAttrDefault have anything to do with attmissingval.
your patch looks very good!!!
within ATExecAddColumn, we can
if (!missingIsNull)
StoreAttrMissingVal(rel, attribute->attnum, missingval, missingIsNull);
to save some cycles?
+ /* ... and store it. If it's null, nothing is stored. */
+ StoreAttrMissingVal(rel, attribute->attnum,
+ missingval, missingIsNull);
+ has_missing = !missingIsNull;
+ FreeExecutorState(estate);
do we need pfree missingval?
+ else
+ {
+ /*
+ * Failed to use missing mode. We have to do a table rewrite
+ * to install the value --- unless it's a virtual generated
+ * column.
+ */
+ if (colDef->generated != ATTRIBUTE_GENERATED_VIRTUAL)
+ tab->rewrite |= AT_REWRITE_DEFAULT_VAL;
+ }
maybe here add comments mentioning that the identity column needs table rewrite.
since we removed ``tab->rewrite |= AT_REWRITE_DEFAULT_VAL;``
in the if (colDef->identity) branch.
StoreAttrDefault is at the end of
ATExecAlterColumnType, ATExecCookedColumnDefault, StoreConstraints
these function later will call CommandCounterIncrement
I also checked AddRelationNewConstraints surrounding code.
overall i think StoreAttrDefault doesn't have CommandCounterIncrement
should be fine.
looking at DefineRelation comments:
* We can set the atthasdef flags now in the tuple descriptor; this just
* saves StoreAttrDefault from having to do an immediate update of the
* pg_attribute rows.
this seems not right?
DefineRelation->heap_create_with_catalog->heap_create->RelationBuildLocalRelation->CreateTupleDescCopy
don't copy atthasdef.
RelationBuildLocalRelation later didn't touch atthasdef.
populate_compact_attribute didn't touch atthasdef.
so StoreAttrDefault has to update that pg_attribute row.
somehow related, if you look at AddRelationNewConstraints.
```
foreach_ptr(RawColumnDefault, colDef, newColDefaults)
{
defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal);
cooked = (CookedConstraint *) palloc(sizeof(CookedConstraint));
cooked->contype = CONSTR_DEFAULT;
cooked->conoid = defOid;
}
maybe we can minor change comments in struct CookedConstraint.conoid.
We can mention: if contype is CONSTR_DEFAULT then it is pg_attrdef.oid
for the default expression.
On Mon, Mar 3, 2025 at 4:45 PM jian he <jian.universality@gmail.com> wrote:
looking at DefineRelation comments:
* We can set the atthasdef flags now in the tuple descriptor; this just
* saves StoreAttrDefault from having to do an immediate update of the
* pg_attribute rows.
this seems not right?
DefineRelation->heap_create_with_catalog->heap_create->RelationBuildLocalRelation->CreateTupleDescCopy
don't copy atthasdef.
RelationBuildLocalRelation later didn't touch atthasdef.
populate_compact_attribute didn't touch atthasdef.
so StoreAttrDefault has to update that pg_attribute row.
CREATE TABLE (COLUMN x DEFAULT y):
for each (Relation rel, AttrNumber attnum), we can enter StoreAttrDefault once.
Also for the above quoted reason,
in DefineRelation, we didn't change pg_attribute.atthasdef before
entering StoreAttrDefault.
ALTER TABLE ALTER COLUMN SET/DROP DEFAULT:
in ATExecColumnDefault will first do RemoveAttrDefault, then re add
the default expression.
In this case, in StoreAttrDefault: attStruct->atthasdef will be false.
overall, i think DefineRelation about StoreAttrDefault comments can be removed.
and StoreAttrDefault, "if (!attStruct->atthasdef)" part logic can also
be changed.
The attached is the minor change I came up with, based on v3-0001 and v3-0002.
Attachments:
v3-0001-minor-refactoring-DefineRelation-StoreAttrDefa.no-cfbotapplication/octet-stream; name=v3-0001-minor-refactoring-DefineRelation-StoreAttrDefa.no-cfbotDownload+12-20
jian he <jian.universality@gmail.com> writes:
within ATExecAddColumn, we can
if (!missingIsNull)
StoreAttrMissingVal(rel, attribute->attnum, missingval, missingIsNull);
to save some cycles?
Probably not really worth it: surely that's going to be a very
infrequent edge case. We already eliminated cases with a simple
null-Constant default, so we'd only get here with a more complex
expression that evaluates to null.
+ /* ... and store it. If it's null, nothing is stored. */ + StoreAttrMissingVal(rel, attribute->attnum, + missingval, missingIsNull); + has_missing = !missingIsNull; + FreeExecutorState(estate); do we need pfree missingval?
I don't see the point. It's not like this code leaks nothing else,
and we shouldn't be running in a long-lived memory context.
maybe here add comments mentioning that the identity column needs table rewrite.
since we removed ``tab->rewrite |= AT_REWRITE_DEFAULT_VAL;``
in the if (colDef->identity) branch.
We don't need that because it's not a special case anymore: the
NextValueExpr will correctly be seen as a volatile default.
The existing code needs that hack because it creates the
NextValueExpr after the point at which expression volatility
is checked, but that's just poor design.
looking at DefineRelation comments:
* We can set the atthasdef flags now in the tuple descriptor; this just
* saves StoreAttrDefault from having to do an immediate update of the
* pg_attribute rows.
this seems not right?
DefineRelation->heap_create_with_catalog->heap_create->RelationBuildLocalRelation->CreateTupleDescCopy
don't copy atthasdef.
RelationBuildLocalRelation later didn't touch atthasdef.
populate_compact_attribute didn't touch atthasdef.
so StoreAttrDefault has to update that pg_attribute row.
You are right: that code has no effect, and if it did have an
effect it would be wrong in the case where somebody writes
"DEFAULT NULL" explicitly. We'd end with atthasdef set and
nothing in pg_attrdef, which would upset various places later.
So we should remove that comment and also the adjustments of
atthasdef. That seems like an independent patch though.
regards, tom lane
I wrote:
jian he <jian.universality@gmail.com> writes:
within ATExecAddColumn, we can
if (!missingIsNull)
StoreAttrMissingVal(rel, attribute->attnum, missingval, missingIsNull);
to save some cycles?
Probably not really worth it: surely that's going to be a very
infrequent edge case.
On second thought, it is worth doing like that, not on speed
grounds but because we can make StoreAttrMissingVal simpler
by not handling the is-null case at all.
Pushed with that adjustment and some other minor polishing.
regards, tom lane
Hello Tom,
03.03.2025 20:38, Tom Lane wrote:
Pushed with that adjustment and some other minor polishing.
I've discovered that 95f650674 introduced a defect similar to bug #18297,
but this time with DEFAULT. Namely, the following script:
CREATE TABLE a (aa text);
CREATE TABLE c (cc text) INHERITS (a);
CREATE TABLE d (dd text) INHERITS (c, a);
ALTER TABLE a ADD COLUMN i int DEFAULT 1;
fails with:
ERROR: XX000: tuple already updated by self
LOCATION: simple_heap_update, heapam.c:4421
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Alexander Lakhin <exclusion@gmail.com> writes:
I've discovered that 95f650674 introduced a defect similar to bug #18297,
but this time with DEFAULT. Namely, the following script:
CREATE TABLE a (aa text);
CREATE TABLE c (cc text) INHERITS (a);
CREATE TABLE d (dd text) INHERITS (c, a);
ALTER TABLE a ADD COLUMN i int DEFAULT 1;
fails with:
ERROR: XX000: tuple already updated by self
LOCATION: simple_heap_update, heapam.c:4421
Hmm, yeah. The failing call is here:
/* Bump the existing child att's inhcount */
...
CatalogTupleUpdate(attrdesc, &tuple->t_self, tuple);
so I think you're right that that code path is now short a
CommandCounterIncrement() somewhere. I'll look tomorrow if
nobody beats me to it.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> 于2025年4月2日周三 09:05写道:
Alexander Lakhin <exclusion@gmail.com> writes:
I've discovered that 95f650674 introduced a defect similar to bug #18297,
but this time with DEFAULT. Namely, the following script:
CREATE TABLE a (aa text);
CREATE TABLE c (cc text) INHERITS (a);
CREATE TABLE d (dd text) INHERITS (c, a);
ALTER TABLE a ADD COLUMN i int DEFAULT 1;fails with:
ERROR: XX000: tuple already updated by self
LOCATION: simple_heap_update, heapam.c:4421Hmm, yeah. The failing call is here:
/* Bump the existing child att's inhcount */
...
CatalogTupleUpdate(attrdesc, &tuple->t_self, tuple);so I think you're right that that code path is now short a
CommandCounterIncrement() somewhere. I'll look tomorrow if
nobody beats me to it.
Yes, when table a process its children, which are table c and table d.
Table c is first to be done.
At the same time, table d is also child of table c, so after updating own
pg_attribute tuple, table c will
process its child table d. And table d update its pg_attribute catalog
tuple.
After finishing table c, the logic returning to continue to process table
a's children, which this time is table d.
Between table d pg_attribute tuple updated as child of table c and updating
table d pg_attribute tuple again as child of table a,
there is no call CommandCounterIncrement().
So let's add CommandCounterIncrement() after calling StoreAttrMissingVal().
--
Thanks, Tender Wang
Attachments:
0001-Fix-tuple-already-updated-by-self-issue.patchtext/plain; charset=US-ASCII; name=0001-Fix-tuple-already-updated-by-self-issue.patchDownload+27-1
Tender Wang <tndrwang@gmail.com> writes:
So let's add CommandCounterIncrement() after calling StoreAttrMissingVal().
Pushed with minor editorialization. I thought for a bit about whether
the CCI call should be inside StoreAttrMissingVal; but its cohorts in
heap.c don't do their own CCI calls, so this way seems the least
surprising.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> 于2025年4月2日周三 23:15写道:
Tender Wang <tndrwang@gmail.com> writes:
So let's add CommandCounterIncrement() after calling
StoreAttrMissingVal().
Pushed with minor editorialization. I thought for a bit about whether
the CCI call should be inside StoreAttrMissingVal; but its cohorts in
heap.c don't do their own CCI calls, so this way seems the least
surprising.
Yes, there is no direct call CCI in heap.c. I also noticed this.
Thanks for pushing.
--
Thanks, Tender Wang