future of serial and identity columns

Started by Peter Eisentrautabout 3 years ago9 messages
#1Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
1 attachment(s)

In PostgreSQL 10, we added identity columns, as an alternative to serial
columns (since 6.something). They mostly work the same. Identity
columns are SQL-conforming, have some more features (e.g., overriding
clause), and are a bit more robust in schema management. Some of that
was described in [0]https://www.enterprisedb.com/blog/postgresql-10-identity-columns-explained. AFAICT, there have been no complaints since that
identity columns lack features or are somehow a regression over serial
columns.

But clearly, the syntax "serial" is more handy, and most casual examples
use that syntax. So it seems like we are stuck with maintaining these
two variants in parallel forever. I was thinking we could nudge this a
little by remapping "serial" internally to create an identity column
instead. At least then over time, the use of the older serial
mechanisms would go away.

Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE +
ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we
did this, any existing databases would keep their old semantics, and
those who really need it can manually create the old semantics as well.

Attached is a demo patch how the implementation of this change would
look like. This creates a bunch of regression test failures, but
AFAICT, those are mainly display differences and some very peculiar test
setups that are intentionally examining some edge cases. These would
need to be investigated in more detail, of course.

[0]: https://www.enterprisedb.com/blog/postgresql-10-identity-columns-explained
https://www.enterprisedb.com/blog/postgresql-10-identity-columns-explained

Attachments:

0001-WIP-Change-serial-types-to-map-to-identity-columns.patchtext/plain; charset=UTF-8; name=0001-WIP-Change-serial-types-to-map-to-identity-columns.patch
#2Laurenz Albe
Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Peter Eisentraut (#1)
Re: future of serial and identity columns

On Tue, 2022-10-04 at 09:41 +0200, Peter Eisentraut wrote:

In PostgreSQL 10, we added identity columns, as an alternative to serial
columns (since 6.something).  They mostly work the same.  Identity
columns are SQL-conforming, have some more features (e.g., overriding
clause), and are a bit more robust in schema management.  Some of that
was described in [0].  AFAICT, there have been no complaints since that
identity columns lack features or are somehow a regression over serial
columns.

But clearly, the syntax "serial" is more handy, and most casual examples
use that syntax.  So it seems like we are stuck with maintaining these
two variants in parallel forever.  I was thinking we could nudge this a
little by remapping "serial" internally to create an identity column
instead.  At least then over time, the use of the older serial
mechanisms would go away.

I think that would be great.
That might generate some confusion among users who follow old tutorials
and are surprised that the eventual table definition differs, but I'd say
that is a good thing.

Yours,
Laurenz Albe

#3Bruce Momjian
Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#1)
Re: future of serial and identity columns

On Tue, Oct 4, 2022 at 09:41:19AM +0200, Peter Eisentraut wrote:

In PostgreSQL 10, we added identity columns, as an alternative to serial
columns (since 6.something). They mostly work the same. Identity columns
are SQL-conforming, have some more features (e.g., overriding clause), and
are a bit more robust in schema management. Some of that was described in
[0]. AFAICT, there have been no complaints since that identity columns lack
features or are somehow a regression over serial columns.

FYI, SERIAL came from Informix syntax, and it was already a macro, so
making it a different macro seems fine. ;-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#4Vik Fearing
Vik Fearing
vik@postgresfriends.org
In reply to: Peter Eisentraut (#1)
Re: future of serial and identity columns

On 10/4/22 09:41, Peter Eisentraut wrote:

In PostgreSQL 10, we added identity columns, as an alternative to serial
columns (since 6.something).  They mostly work the same.  Identity
columns are SQL-conforming, have some more features (e.g., overriding
clause), and are a bit more robust in schema management.  Some of that
was described in [0].  AFAICT, there have been no complaints since that
identity columns lack features or are somehow a regression over serial
columns.

But clearly, the syntax "serial" is more handy, and most casual examples
use that syntax.  So it seems like we are stuck with maintaining these
two variants in parallel forever.  I was thinking we could nudge this a
little by remapping "serial" internally to create an identity column
instead.  At least then over time, the use of the older serial
mechanisms would go away.

Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE +
ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT).  So if we
did this, any existing databases would keep their old semantics, and
those who really need it can manually create the old semantics as well.

Attached is a demo patch how the implementation of this change would
look like.  This creates a bunch of regression test failures, but
AFAICT, those are mainly display differences and some very peculiar test
setups that are intentionally examining some edge cases.  These would
need to be investigated in more detail, of course.

I haven't tested the patch yet, just read it.

Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the
latter.
--
Vik Fearing

#5Magnus Hagander
Magnus Hagander
magnus@hagander.net
In reply to: Vik Fearing (#4)
Re: future of serial and identity columns

On Fri, Oct 7, 2022 at 2:03 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 10/4/22 09:41, Peter Eisentraut wrote:

In PostgreSQL 10, we added identity columns, as an alternative to serial
columns (since 6.something). They mostly work the same. Identity
columns are SQL-conforming, have some more features (e.g., overriding
clause), and are a bit more robust in schema management. Some of that
was described in [0]. AFAICT, there have been no complaints since that
identity columns lack features or are somehow a regression over serial
columns.

But clearly, the syntax "serial" is more handy, and most casual examples
use that syntax. So it seems like we are stuck with maintaining these
two variants in parallel forever. I was thinking we could nudge this a
little by remapping "serial" internally to create an identity column
instead. At least then over time, the use of the older serial
mechanisms would go away.

Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE +
ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we
did this, any existing databases would keep their old semantics, and
those who really need it can manually create the old semantics as well.

Attached is a demo patch how the implementation of this change would
look like. This creates a bunch of regression test failures, but
AFAICT, those are mainly display differences and some very peculiar test
setups that are intentionally examining some edge cases. These would
need to be investigated in more detail, of course.

I haven't tested the patch yet, just read it.

Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the
latter.

I would assume to maintain backwards compatibility with the semantics of
SERIAL today?

I do also prefer ALWAYS, but that would make it a compatibility break.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#6Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Peter Eisentraut (#1)
2 attachment(s)
Re: future of serial and identity columns

On 04.10.22 09:41, Peter Eisentraut wrote:

Attached is a demo patch how the implementation of this change would
look like.  This creates a bunch of regression test failures, but
AFAICT, those are mainly display differences and some very peculiar test
setups that are intentionally examining some edge cases.  These would
need to be investigated in more detail, of course.

The feedback was pretty positive, so I dug through all the tests to at
least get to the point where I could see the end of it. The attached
patch 0001 is the actual code and documentation changes. The 0002 patch
is just tests randomly updated or disabled to make the whole suite pass.
This reveals that there are a few things that would warrant further
investigation, in particular around extensions and partitioning. To be
continued.

Attachments:

v2-0001-WIP-Change-serial-types-to-map-to-identity-column.patchtext/plain; charset=UTF-8; name=v2-0001-WIP-Change-serial-types-to-map-to-identity-column.patch
v2-0002-FIXME-various-tests-that-still-need-to-be-looked-.patchtext/plain; charset=UTF-8; name=v2-0002-FIXME-various-tests-that-still-need-to-be-looked-.patch
#7Corey Huinker
Corey Huinker
corey.huinker@gmail.com
In reply to: Peter Eisentraut (#6)
Re: future of serial and identity columns

The feedback was pretty positive, so I dug through all the tests to at
least get to the point where I could see the end of it. The attached
patch 0001 is the actual code and documentation changes. The 0002 patch
is just tests randomly updated or disabled to make the whole suite pass.
This reveals that there are a few things that would warrant further
investigation, in particular around extensions and partitioning. To be
continued.

I like what I see so far!

Question: the xref refers the reader to sql-createtable, which is a pretty
big page, which could leave the reader lost. Would it make sense to create
a SQL-CREATETABLE-IDENTITY anchor and link to that instead?

#8Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Corey Huinker (#7)
Re: future of serial and identity columns

On 12.10.22 08:22, Corey Huinker wrote:

Question: the xref  refers the reader to sql-createtable, which is a
pretty big page, which could leave the reader lost. Would it make sense
to create a SQL-CREATETABLE-IDENTITY anchor and link to that instead?

Yes, I think that would be good.

#9Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Peter Eisentraut (#6)
Re: future of serial and identity columns

On 2022-Oct-11, Peter Eisentraut wrote:

diff --git a/src/test/modules/test_ddl_deparse/expected/alter_table.out b/src/test/modules/test_ddl_deparse/expected/alter_table.out
index 87a1ab7aabce..30e3dbb8d08a 100644
--- a/src/test/modules/test_ddl_deparse/expected/alter_table.out
+++ b/src/test/modules/test_ddl_deparse/expected/alter_table.out
@@ -25,12 +25,9 @@ NOTICE:  DDL test: type simple, tag CREATE TABLE
CREATE TABLE grandchild () INHERITS (child);
NOTICE:  DDL test: type simple, tag CREATE TABLE
ALTER TABLE parent ADD COLUMN b serial;
-NOTICE:  DDL test: type simple, tag CREATE SEQUENCE
-NOTICE:  DDL test: type alter table, tag ALTER TABLE
-NOTICE:    subcommand: type ADD COLUMN (and recurse) desc column b of table parent
-NOTICE:  DDL test: type simple, tag ALTER SEQUENCE
+ERROR:  cannot recursively add identity column to table that has child tables

I think this change merits some discussion. Surely we cannot simply
disallow SERIAL from being used with inheritance. Do we need to have
a way for identity columns to be used by children tables?

(My first thought was "let's keep SERIAL as the old code when used for
inheritance", but then I realized that the parent table starts as a
normal-looking table that only later acquires inheritors, so we wouldn't
know ahead of time that we need to treat that SERIAL column in a special
way.)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La vida es para el que se aventura"