Fix domain fast defaults on empty tables

Started by Chao Li19 days ago11 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

I tested "[a0b6ef29a] Enable fast default for domains with non-volatile constraints". After tracing some cases from the regression tests, I came up with this test case and found a bug:
```
evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero
```

It looks like errors inside the CHECK expression itself, such as the int4div division-by-zero in this test, are still hard errors that can fail the ALTER TABLE command.

For the fix, I worked out two solutions:

Solution 1
========

We can add PG_TRY/PG_CATCH to capture those hard errors. But as a0b6ef29a's commit message mentions, the fallback should only apply while evaluating the CoerceToDomain expression. To avoid broadly suppressing hard errors, I only catch hard errors from domain_check_safe(), which verifies the default value against the domain. The default expression itself is still evaluated with hard errors.

My concern with this solution is that there might be some error from domain_check_safe() that I am not aware of and that would be hidden by the try-catch. But that may be acceptable, because it matches the behavior before a0b6ef29a, so at least it is not a regression.

Solution 2
========

This solution is simpler and is based on the purpose of the original feature. a0b6ef29a's commit message says the purpose of the feature is to avoid table rewriting. Since an empty table has no data to rewrite, we can bypass the fast path for empty tables.

The problem with this solution is that I currently use RelationGetNumberOfBlocks(rel) to decide whether the table is empty, so it cannot handle cases like:
```
Begin;
Delete from t;
Alter table t add column …
```

See the attached patches for details. v1-s1 is solution 1, and v1-s2 is solution 2. Please let me know which solution is preferred.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-s1-0001-Handle-throwing-domain-checks-when-probing-fas.patchapplication/octet-stream; name=v1-s1-0001-Handle-throwing-domain-checks-when-probing-fas.patch; x-unix-mode=0644Download+87-16
v1-s2-0001-Preserve-empty-table-behavior-for-domain-fast-.patchapplication/octet-stream; name=v1-s2-0001-Preserve-empty-table-behavior-for-domain-fast-.patch; x-unix-mode=0644Download+44-5
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Chao Li (#1)
Re: Fix domain fast defaults on empty tables

On 5 June 2026 10:48:00 EEST, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

I tested "[a0b6ef29a] Enable fast default for domains with non-volatile constraints". After tracing some cases from the regression tests, I came up with this test case and found a bug:
```
evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero
```

It looks like errors inside the CHECK expression itself, such as the int4div division-by-zero in this test, are still hard errors that can fail the ALTER TABLE command.

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

- Heikki

#3Chao Li
li.evan.chao@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Fix domain fast defaults on empty tables

On Jun 5, 2026, at 17:04, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 5 June 2026 10:48:00 EEST, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

I tested "[a0b6ef29a] Enable fast default for domains with non-volatile constraints". After tracing some cases from the regression tests, I came up with this test case and found a bug:
```
evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero
```

It looks like errors inside the CHECK expression itself, such as the int4div division-by-zero in this test, are still hard errors that can fail the ALTER TABLE command.

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

- Heikki

I agree that rejecting the default is semantically reasonable. But the concern is that this is a user-visible behavior change, and the feature didn’t seem intended to make that change. Before a0b6ef29a, this ALTER TABLE command could succeed. If we now want to reject such defaults, I think that should be documented explicitly.

Also, there might be some practical use for the current behavior. For example, an invalid default can be used to prevent INSERT from omitting the column.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: Fix domain fast defaults on empty tables

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 5 June 2026 10:48:00 EEST, Chao Li <li.evan.chao@gmail.com> wrote:

evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

I think there's reason for concern here, which is that we do not throw
an error for the apparently equivalent case

regression=# create table t2 (a int, b d_div default 1);
CREATE TABLE

This will give you an error at INSERT, but not CREATE. So this
is inconsistent, as well as different from the pre-v19 behavior.

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
Re: Fix domain fast defaults on empty tables

On 2026-06-05 Fr 10:08 AM, Tom Lane wrote:

Heikki Linnakangas<hlinnaka@iki.fi> writes:

On 5 June 2026 10:48:00 EEST, Chao Li<li.evan.chao@gmail.com> wrote:

evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

I think there's reason for concern here, which is that we do not throw
an error for the apparently equivalent case

regression=# create table t2 (a int, b d_div default 1);
CREATE TABLE

This will give you an error at INSERT, but not CREATE. So this
is inconsistent, as well as different from the pre-v19 behavior.

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

Seems reasonable. So which of Chao's solutions do you prefer? I think
both will meet the pg_dump issue, not sure how much we care about the
case where we have deleted all the rows but not truncated the table.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: Fix domain fast defaults on empty tables

Andrew Dunstan <andrew@dunslane.net> writes:

On 2026-06-05 Fr 10:08 AM, Tom Lane wrote:

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

Seems reasonable. So which of Chao's solutions do you prefer? I think
both will meet the pg_dump issue, not sure how much we care about the
case where we have deleted all the rows but not truncated the table.

[ studies it a bit ...] TBH, I don't like either of these,
nor do I like a0b6ef29a to begin with. The fundamental problem
with this whole mess is that it treats certain kinds of
default-expression evaluation error (i.e., domain CHECK failures)
differently from others. There is no way that that leads to a
consistent user experience. The current complaint is one
manifestation of that, but I'm sure there are others, eg having
to do with domain coercions lower down in a default expression.

It could work if ExecPrepareExprWithContext were capable of
soft-trapping essentially all execution errors, but that's
not true today and I strongly doubt it ever will be true.

I think Chao's v1-s2-0001 points the way towards what could be a
workable solution: if we see that the table is known empty (after
we already have exclusive lock on it!), we could skip both the table
rewrite and the insertion of an attmissingval, and thereby not need
to evaluate the default at all during ALTER TABLE. As Chao says,
simplistic versions of "known empty" would expose some user-visible
behavioral inconsistencies, but I'm not sure how much that matters.
For pg_dump and similar applications we would get the behavior we
needed even with just an is-physically-empty check, since all their
CREATE/ALTER DDL happens before we ever insert any data. You could
imagine going further and scanning the table for live tuples, but
I don't know that that's going to be worth the cycles.

However, I doubt that a bare "RelationGetNumberOfBlocks() == 0" check
is acceptable: we probably need to let the table access method have
control of this. heapam could do "RelationGetNumberOfBlocks() == 0",
but other TAMs might need to do something else. So that means that
this path requires a new TableAmRoutine method, and that probably
puts it in the too-late-for-v19 category.

My recommendation: we ought to revert a0b6ef29a for now and
redesign the optimization for v20.

If you don't like that answer, I'd be firmly against v1-s1-0001
in any case. It repeats the classic mistake of supposing that
we can PG_CATCH random errors and not invoke transaction cleanup.

BTW, I do not like the fact that a0b6ef29a removed the comment
stanza explaining why we need this fake default expression at all.
That's still largely applicable AFAICS.

regards, tom lane

#7Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#6)
Re: Fix domain fast defaults on empty tables

On Jun 7, 2026, at 04:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 2026-06-05 Fr 10:08 AM, Tom Lane wrote:

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

Seems reasonable. So which of Chao's solutions do you prefer? I think
both will meet the pg_dump issue, not sure how much we care about the
case where we have deleted all the rows but not truncated the table.

[ studies it a bit ...] TBH, I don't like either of these,
nor do I like a0b6ef29a to begin with. The fundamental problem
with this whole mess is that it treats certain kinds of
default-expression evaluation error (i.e., domain CHECK failures)
differently from others. There is no way that that leads to a
consistent user experience. The current complaint is one
manifestation of that, but I'm sure there are others, eg having
to do with domain coercions lower down in a default expression.

It could work if ExecPrepareExprWithContext were capable of
soft-trapping essentially all execution errors, but that's
not true today and I strongly doubt it ever will be true.

I think Chao's v1-s2-0001 points the way towards what could be a
workable solution: if we see that the table is known empty (after
we already have exclusive lock on it!), we could skip both the table
rewrite and the insertion of an attmissingval, and thereby not need
to evaluate the default at all during ALTER TABLE. As Chao says,
simplistic versions of "known empty" would expose some user-visible
behavioral inconsistencies, but I'm not sure how much that matters.
For pg_dump and similar applications we would get the behavior we
needed even with just an is-physically-empty check, since all their
CREATE/ALTER DDL happens before we ever insert any data. You could
imagine going further and scanning the table for live tuples, but
I don't know that that's going to be worth the cycles.

However, I doubt that a bare "RelationGetNumberOfBlocks() == 0" check
is acceptable: we probably need to let the table access method have
control of this. heapam could do "RelationGetNumberOfBlocks() == 0",
but other TAMs might need to do something else. So that means that
this path requires a new TableAmRoutine method, and that probably
puts it in the too-late-for-v19 category.

My recommendation: we ought to revert a0b6ef29a for now and
redesign the optimization for v20.

If you don't like that answer, I'd be firmly against v1-s1-0001
in any case. It repeats the classic mistake of supposing that
we can PG_CATCH random errors and not invoke transaction cleanup.

BTW, I do not like the fact that a0b6ef29a removed the comment
stanza explaining why we need this fake default expression at all.
That's still largely applicable AFAICS.

regards, tom lane

Thanks, Tom, for the detailed explanation.

I'll hold off here and wait for Andrew's decision on how to proceed. Sounds like reverting a0b6ef29a may be the preferred option for now.

For v1-s2, I’m not ware of a better way to decide whether a table is known empty. I ever considered a count(*)-style scan or a LIMIT 1 scan, but those seem worse, they would add a new table scan inside ALTER TABLE just to decide whether to evaluate the default.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#8jian he
jian.universality@gmail.com
In reply to: Tom Lane (#4)
Re: Fix domain fast defaults on empty tables

On Fri, Jun 5, 2026 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 5 June 2026 10:48:00 EEST, Chao Li <li.evan.chao@gmail.com> wrote:

evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

I think there's reason for concern here, which is that we do not throw
an error for the apparently equivalent case

regression=# create table t2 (a int, b d_div default 1);
CREATE TABLE

This will give you an error at INSERT, but not CREATE. So this
is inconsistent, as well as different from the pre-v19 behavior.

However, this is normal behavior for non-domain types.

create table t2 (a numeric default (1::numeric/0.0::float4)); -- ok
alter table t2 add column b numeric default ((1::numeric/0.0::float4)); -- error

#9jian he
jian.universality@gmail.com
In reply to: Tom Lane (#4)
Re: Fix domain fast defaults on empty tables

On Fri, Jun 5, 2026 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 5 June 2026 10:48:00 EEST, Chao Li <li.evan.chao@gmail.com> wrote:

evantest=# create domain d_div as int check (1 / (value - 1) > 0);
CREATE DOMAIN
evantest=# create table t (a int);
CREATE TABLE
evantest=# alter table t add column b d_div default 1;
ERROR: division by zero

It seems totally reasonable to get an error in that case. '1' is not a valid value for the datatype, whether or not there are any rows in the table.

I think there's reason for concern here, which is that we do not throw
an error for the apparently equivalent case

regression=# create table t2 (a int, b d_div default 1);
CREATE TABLE

This will give you an error at INSERT, but not CREATE. So this
is inconsistent, as well as different from the pre-v19 behavior.

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

Per the docs [1]https://www.postgresql.org/docs/current/sql-altertable.html, we have two relevant forms:
ALTER [ COLUMN ] column_name SET DEFAULT expression
ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type ... [ DEFAULT
default_expr ]

ATExecColumnDefault -> AddRelationNewConstraints does NOT evaluate the
default expression.
So ALTER COLUMN SET DEFAULT will never error out on a bad expression
because it is never evaluated at DDL time.

ALTER TABLE ADD COLUMN with a DEFAULT: pg_dump.c will never emit such a command.
You can verify this by searching for the keyword "add" in
src/bin/pg_dump/pg_dump.c, and looking at each occurrence one by one.
Looking at dumpTableSchema() confirms the same: we do not produce
command: ALTER TABLE ADD COLUMN.

See dumpTableSchema below comments related FOR LOOP part also
/*
* Dump additional per-column properties that we can't handle in the
* main CREATE TABLE command.
*/

[1]: https://www.postgresql.org/docs/current/sql-altertable.html

--------------------------------------------
create or replace function dummy() returns numeric AS $$
BEGIN
RETURN 1/0;
END$$ immutable LANGUAGE plpgsql;

create table t1(a numeric default dummy());

alter table t1 add column b numeric default dummy();
ERROR: division by zero
CONTEXT: PL/pgSQL expression "1/0"
PL/pgSQL function dummy() line 3 at RETURN

As you can see, if pg_dump somehow converted the CREATE default expression into
an ALTER TABLE ADD COLUMN DEFAULT, we would already be facing this issue.

Am I missing something?

--
jian
https://www.enterprisedb.com/

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Chao Li (#7)
Re: Fix domain fast defaults on empty tables

On 2026-06-06 Sa 10:09 PM, Chao Li wrote:

On Jun 7, 2026, at 04:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 2026-06-05 Fr 10:08 AM, Tom Lane wrote:

Concretely, I'm pretty sure it is a hazard for pg_dump, which thinks
it can freely transform bits of CREATE operations into ALTERs.
I didn't try to make an example case, but I suspect it is now possible
to create a database that will fail dump/restore because of this
inconsistency.

Seems reasonable. So which of Chao's solutions do you prefer? I think
both will meet the pg_dump issue, not sure how much we care about the
case where we have deleted all the rows but not truncated the table.

[ studies it a bit ...] TBH, I don't like either of these,
nor do I like a0b6ef29a to begin with. The fundamental problem
with this whole mess is that it treats certain kinds of
default-expression evaluation error (i.e., domain CHECK failures)
differently from others. There is no way that that leads to a
consistent user experience. The current complaint is one
manifestation of that, but I'm sure there are others, eg having
to do with domain coercions lower down in a default expression.

It could work if ExecPrepareExprWithContext were capable of
soft-trapping essentially all execution errors, but that's
not true today and I strongly doubt it ever will be true.

I think Chao's v1-s2-0001 points the way towards what could be a
workable solution: if we see that the table is known empty (after
we already have exclusive lock on it!), we could skip both the table
rewrite and the insertion of an attmissingval, and thereby not need
to evaluate the default at all during ALTER TABLE. As Chao says,
simplistic versions of "known empty" would expose some user-visible
behavioral inconsistencies, but I'm not sure how much that matters.
For pg_dump and similar applications we would get the behavior we
needed even with just an is-physically-empty check, since all their
CREATE/ALTER DDL happens before we ever insert any data. You could
imagine going further and scanning the table for live tuples, but
I don't know that that's going to be worth the cycles.

However, I doubt that a bare "RelationGetNumberOfBlocks() == 0" check
is acceptable: we probably need to let the table access method have
control of this. heapam could do "RelationGetNumberOfBlocks() == 0",
but other TAMs might need to do something else. So that means that
this path requires a new TableAmRoutine method, and that probably
puts it in the too-late-for-v19 category.

My recommendation: we ought to revert a0b6ef29a for now and
redesign the optimization for v20.

If you don't like that answer, I'd be firmly against v1-s1-0001
in any case. It repeats the classic mistake of supposing that
we can PG_CATCH random errors and not invoke transaction cleanup.

BTW, I do not like the fact that a0b6ef29a removed the comment
stanza explaining why we need this fake default expression at all.
That's still largely applicable AFAICS.

regards, tom lane

Thanks, Tom, for the detailed explanation.

I'll hold off here and wait for Andrew's decision on how to proceed. Sounds like reverting a0b6ef29a may be the preferred option for now.

For v1-s2, I’m not ware of a better way to decide whether a table is known empty. I ever considered a count(*)-style scan or a LIMIT 1 scan, but those seem worse, they would add a new table scan inside ALTER TABLE just to decide whether to evaluate the default.

Given Tom's reasonable suggestion that we really need a new Table AM
callback to do this sensibly, I agree we should revert it. Will do.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#8)
Re: Fix domain fast defaults on empty tables

jian he <jian.universality@gmail.com> writes:

On Fri, Jun 5, 2026 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's reason for concern here, which is that we do not throw
an error for the apparently equivalent case
regression=# create table t2 (a int, b d_div default 1);
CREATE TABLE
This will give you an error at INSERT, but not CREATE. So this
is inconsistent, as well as different from the pre-v19 behavior.

However, this is normal behavior for non-domain types.
create table t2 (a numeric default (1::numeric/0.0::float4)); -- ok
alter table t2 add column b numeric default ((1::numeric/0.0::float4)); -- error

Well, that's not great either. The idea of avoiding evaluating
the default expression altogether when the table is empty could
ameliorate that problem too.

But my point stands: a0b6ef29a introduces different treatment for
domain-CHECK-constraint errors than other types of runtime errors
in the default expression. I think that is fundamentally the wrong
direction to go in. We want more consistency here, not less.

regards, tom lane