BUG #14952: COPY fails to fill in IDENTITY column default value

Started by Steven Winfieldover 8 years ago14 messagesbugs
Jump to latest
#1Steven Winfield
Steven.Winfield@cantabcapital.com

The following bug has been logged on the website:

Bug reference: 14952
Logged by: Steven Winfield
Email address: steven.winfield@cantabcapital.com
PostgreSQL version: 10.0
Operating system: Linux
Description:

COPYing data to a table with an IDENTITY column, where the column's value
isn't specified in the copied input, fails because COPY attempts to insert a
NULL value for the column:

test=# CREATE TABLE identity_test (id bigint GENERATED ALWAYS AS IDENTITY,
name text);
test=# COPY identity_test (name) FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

foo
\.

ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, foo).
CONTEXT: COPY identity_test, line 1: "foo"

Compare this with a serial column:

test=# CREATE TABLE serial_test (id bigserial, name text);
test=# COPY serial_test (name) FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

foo
\.

COPY 1

test=# \d identity_test
Table "public.identity_test"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+------------------------------
id | bigint | | not null | generated always as identity
name | text | | |

test=# \d serial_test
Table "public.serial_test"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+-----------------------------------------
id | bigint | | not null |
nextval('serial_test_id_seq'::regclass)
name | text | | |

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Steven Winfield (#1)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:

COPYing data to a table with an IDENTITY column, where the column's value
isn't specified in the copied input, fails because COPY attempts to insert a
NULL value for the column:

That indeed appears to be a bug.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#2)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:

COPYing data to a table with an IDENTITY column, where the column's value
isn't specified in the copied input, fails because COPY attempts to insert a
NULL value for the column:

That indeed appears to be a bug.

That's a bug. When doing a COPY with or without a list of columns, and
that a column is not listed and has a default expression, then this
expression is used. This is a role filled in by build_column_default()
but identity columns need to create a NextValueExpr expression
instead. As this expression is missing, the backend complains about a
NULL input for this column, which is logic without an expression.
Attached is a patch with a regression test.

Peter, what do you think?

Thanks,
--
Michael

Attachments:

copy-identity-fix.patchapplication/octet-stream; name=copy-identity-fix.patchDownload+45-2
#4Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#3)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 12/7/17 20:37, Michael Paquier wrote:

On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 12/7/17 06:29, steven.winfield@cantabcapital.com wrote:

COPYing data to a table with an IDENTITY column, where the column's value
isn't specified in the copied input, fails because COPY attempts to insert a
NULL value for the column:

That indeed appears to be a bug.

That's a bug. When doing a COPY with or without a list of columns, and
that a column is not listed and has a default expression, then this
expression is used. This is a role filled in by build_column_default()
but identity columns need to create a NextValueExpr expression
instead. As this expression is missing, the backend complains about a
NULL input for this column, which is logic without an expression.
Attached is a patch with a regression test.

Peter, what do you think?

Committed. I moved the tests to identity.sql though.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#3)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

Michael Paquier <michael.paquier@gmail.com> writes:

On Fri, Dec 8, 2017 at 2:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

That indeed appears to be a bug.

That's a bug. When doing a COPY with or without a list of columns, and
that a column is not listed and has a default expression, then this
expression is used. This is a role filled in by build_column_default()
but identity columns need to create a NextValueExpr expression
instead. As this expression is missing, the backend complains about a
NULL input for this column, which is logic without an expression.
Attached is a patch with a regression test.

Now that I look more closely at this patch, isn't it fixing things
in the wrong place? Why is it that COPY needs to know about this,
rather than build_column_default()? Aren't we going to have to
fix every other caller of build_column_default()?

For that matter, should build_column_default() know about it explicitly
either? Why aren't we implementing IDENTITY by storing an appropriate
default expression in the catalogs?

regards, tom lane

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On Thu, Dec 14, 2017 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

For that matter, should build_column_default() know about it explicitly
either? Why aren't we implementing IDENTITY by storing an appropriate
default expression in the catalogs?

Wait a minute... If I look at the callers of build_column_default(),
rewriteValuesRTE looks broken as well:
=# CREATE TABLE itest10 (a int GENERATED BY DEFAULT AS IDENTITY, b
text, c bigint);
CREATE TABLE
=# insert into itest10 values (default, 'foo', 0), (default, 'foo2', 1);
ERROR: 23502: null value in column "a" violates not-null constraint
DETAIL: Failing row contains (null, foo, 0).
I would expect the second query to work properly.

It seems that slot_fill_defaults() would fail as well for a logical
worker, and so are ATExecAlterColumnType() and ATExecAddColumn()?
Peter, did you do the separation to help in handling better the cases
with INSERT OVERRIDE?
--
Michael

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#5)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 12/13/17 22:39, Tom Lane wrote:

Now that I look more closely at this patch, isn't it fixing things
in the wrong place? Why is it that COPY needs to know about this,
rather than build_column_default()? Aren't we going to have to
fix every other caller of build_column_default()?

Yeah, that seems to be a problem. Attached is a patch that puts the
logic into build_column_default() and adds a few more tests covering
other omissions.

One problem is that this breaks the case ALTER TABLE ... ADD COLUMN ...
IDENTITY, because the sequence ownership isn't registered until after
the ALTER TABLE command. (This only worked for empty tables. For
pre-populated tables, it failed because of the above omission.) The
second patch tries to fix that. Needs more thought and documentation,
but it's an idea.

For that matter, should build_column_default() know about it explicitly
either? Why aren't we implementing IDENTITY by storing an appropriate
default expression in the catalogs?

Initial versions were coded that way, but it was pretty messy and buggy
because you have to touch a lot of places to teach them the difference
between a real default expression and a synthetic one. Another problem
is that the NextValueExpr has special permission behavior that is not
really representable by a normal expression, thus introducing more
special cases, and possibly permission or security issues.

Obviously, it would have helped in the above case. But it was really messy.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Apply-identity-column-expression-in-build_column_def.patchtext/plain; charset=UTF-8; name=0001-Apply-identity-column-expression-in-build_column_def.patch; x-mac-creator=0; x-mac-type=0Download+76-30
0002-Fix-ALTER-TABLE-ADD-COLUMN-IDENTITY.patchtext/plain; charset=UTF-8; name=0002-Fix-ALTER-TABLE-ADD-COLUMN-IDENTITY.patch; x-mac-creator=0; x-mac-type=0Download+26-8
#8Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#7)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On Sat, Dec 16, 2017 at 2:37 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 12/13/17 22:39, Tom Lane wrote:

For that matter, should build_column_default() know about it explicitly
either? Why aren't we implementing IDENTITY by storing an appropriate
default expression in the catalogs?

Initial versions were coded that way, but it was pretty messy and buggy
because you have to touch a lot of places to teach them the difference
between a real default expression and a synthetic one. Another problem
is that the NextValueExpr has special permission behavior that is not
really representable by a normal expression, thus introducing more
special cases, and possibly permission or security issues.

I have no major objections about 0001. Here are some comments.

@@ -1123,6 +1113,16 @@ build_column_default(Relation rel, int attrno)
Node *expr = NULL;
Oid exprtype;

+   if (att_tup->attidentity)
+   {
I would add a comment explaining why this should come first.
+INSERT INTO itest3 VALUES (DEFAULT, 'a');
+INSERT INTO itest3 VALUES (DEFAULT, 'b'), (DEFAULT, 'c');
+SELECT * FROM itest3;
Thanks for adding a test with INSERT RTEs.
+-- ADD COLUMN
+CREATE TABLE itest13 (a int);
+-- add column to empty table
+ALTER TABLE itest13 ADD COLUMN b int GENERATED BY DEFAULT AS IDENTITY;
+ERROR:  no owned sequence found
Those are hitting an elog(), so I think that it would be better to
issue a proper ereport() instead with ERRCODE_FEATURE_NOT_SUPPORTED as
errcode. It is easier to understand the point you are going to
though...

Now coming to 0002...

-       defval = (Expr *) build_column_default(rel, attribute.attnum);
+       /*
+        * For an identity column, we can't use build_column_default(),
+        * because the sequence ownership isn't set yet.  So do it manually.
+        */
+       if (colDef->identity)
+       {
+           NextValueExpr *nve = makeNode(NextValueExpr);
+
+           nve->seqid = RangeVarGetRelid(colDef->identitySequence,
NoLock, false);
+           nve->typeId = typeOid;
+
+           defval = (Expr *) nve;
+       }
+       else
+           defval = (Expr *) build_column_default(rel, attribute.attnum);
I have a problem with this approach which is basically something
similar to what I complained in the thread about typed and partition
tables for identity columns
(https://www.postgresql.org/message-id/20171023074458.1473.25799@wrigleys.postgresql.org).
In my opinion, this ALTER TABLE handling should be done by treating
identity columns a way similar to default expressions in
transformColumnDefinition(), by storing the FuncExpr node at parsing
time instead of storing the information needed to rebuild it when
executing the query. In short the mapping should get closer to what
default does with nextval or serial.
-- 
Michael
#9Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#8)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 12/18/17 00:49, Michael Paquier wrote:

I have a problem with this approach which is basically something
similar to what I complained in the thread about typed and partition
tables for identity columns
(/messages/by-id/20171023074458.1473.25799@wrigleys.postgresql.org).
In my opinion, this ALTER TABLE handling should be done by treating
identity columns a way similar to default expressions in
transformColumnDefinition(), by storing the FuncExpr node at parsing
time instead of storing the information needed to rebuild it when
executing the query. In short the mapping should get closer to what
default does with nextval or serial.

The serial case works because it stores the sequence *name* in the
default value in the catalog. That doesn't work because for the
identity case we don't store the expression in the catalog. The
proposed patch works by storing the sequence *name* in the internal
structures so that it can be used in place of the stored default value.
So I think this approach is pretty consistent.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#9)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 12/18/17 00:49, Michael Paquier wrote:

In my opinion, this ALTER TABLE handling should be done by treating
identity columns a way similar to default expressions in
transformColumnDefinition(), by storing the FuncExpr node at parsing
time instead of storing the information needed to rebuild it when
executing the query. In short the mapping should get closer to what
default does with nextval or serial.

The serial case works because it stores the sequence *name* in the
default value in the catalog.

Just to clarify: what's *actually* stored for a regular serial column
is the sequence OID (as a regclass constant), not the name. If it
were a name then ALTER SEQUENCE RENAME would break it.

That doesn't work because for the
identity case we don't store the expression in the catalog. The
proposed patch works by storing the sequence *name* in the internal
structures so that it can be used in place of the stored default value.

I trust you really mean you're storing an OID ...

regards, tom lane

#11Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#10)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On Wed, Dec 27, 2017 at 03:43:05PM -0500, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

That doesn't work because for the
identity case we don't store the expression in the catalog. The
proposed patch works by storing the sequence *name* in the internal
structures so that it can be used in place of the stored default value.

I trust you really mean you're storing an OID ...

A RangeVar is used for the proposed patch 0002. Still does it matter? In
the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
GENERATED, then the sequence gets created and visible only once the
transaction adding the column has been created.

char identity; /* attidentity setting */
+ RangeVar *identitySequence;
CollateClause *collClause; /* untransformed COLLATE spec, if any */
It mat be better to tell that this is used to create a sequence in the
same transaction as the one doing the parsing.
--
Michael

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#11)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 12/30/17 18:26, Michael Paquier wrote:

A RangeVar is used for the proposed patch 0002. Still does it matter? In
the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
GENERATED, then the sequence gets created and visible only once the
transaction adding the column has been created.

char identity; /* attidentity setting */
+ RangeVar *identitySequence;
CollateClause *collClause; /* untransformed COLLATE spec, if any */
It mat be better to tell that this is used to create a sequence in the
same transaction as the one doing the parsing.

I would like to get this into next week's minor release as a bug fix.
Other than tweaking some of the comments, is there any more feedback on
this?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#12)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On Thu, Feb 01, 2018 at 10:24:48AM -0500, Peter Eisentraut wrote:

On 12/30/17 18:26, Michael Paquier wrote:

A RangeVar is used for the proposed patch 0002. Still does it matter? In
the scenario proposed here, which is to fix ALTER TABLE ADD COLUMN
GENERATED, then the sequence gets created and visible only once the
transaction adding the column has been created.

char identity; /* attidentity setting */
+ RangeVar *identitySequence;
CollateClause *collClause; /* untransformed COLLATE spec, if any */
It mat be better to tell that this is used to create a sequence in the
same transaction as the one doing the parsing.

I would like to get this into next week's minor release as a bug fix.
Other than tweaking some of the comments, is there any more feedback on
this?

Wrapping again my mind on this one... On top of the comment for
identitySequence, I think that it is important to mention that the
sequence name and a RangeVar are basically safe as they get created
hence they are not visible to other sessions yet. 0001 and 0002 should
be merged.

By the way, on HEAD with two sessions it is easy to bump into sequence
name conflicts with identity columns:
* Session 1:
begin;
create table itest13 (a int);
* Session 2:
create sequence itest13_b_seq;
* Session 1:
alter table itest13 add columns b int generated by default as identity; --blocks
* Session 2:
commit;

And then session 1 reports that:
ERROR: 23505: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(itest13_b_seq, 2200) already exists.
SCHEMA NAME: pg_catalog
TABLE NAME: pg_type
CONSTRAINT NAME: pg_type_typname_nsp_index
LOCATION: _bt_check_unique, nbtinsert.c:434

We surely want to be smarter with the choice of the sequence name for
identity columns. Index names for primary keys are if I recall
correctly indexcmds.c and index.c.
--
Michael

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#13)
Re: BUG #14952: COPY fails to fill in IDENTITY column default value

On 2/1/18 20:49, Michael Paquier wrote:

Wrapping again my mind on this one... On top of the comment for
identitySequence, I think that it is important to mention that the
sequence name and a RangeVar are basically safe as they get created
hence they are not visible to other sessions yet. 0001 and 0002 should
be merged.

Committed with more comments and as one patch.

By the way, on HEAD with two sessions it is easy to bump into sequence
name conflicts with identity columns:
* Session 1:
begin;
create table itest13 (a int);
* Session 2:
create sequence itest13_b_seq;
* Session 1:
alter table itest13 add columns b int generated by default as identity; --blocks
* Session 2:
commit;

And then session 1 reports that:
ERROR: 23505: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(itest13_b_seq, 2200) already exists.
SCHEMA NAME: pg_catalog
TABLE NAME: pg_type
CONSTRAINT NAME: pg_type_typname_nsp_index
LOCATION: _bt_check_unique, nbtinsert.c:434

I think this is equivalent to pre-existing behavior with serial:

S1: begin;
S2: begin;
S2: create sequence t1_a_seq;
S1: create table t1 (a serial, b text); --blocks
S2: commit;
S1: ERROR

I suspect you can also construct situations like this with other
implicitly created objects such as triggers.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services