Proposal: Better generation of values in GENERATED columns.

Started by Daniel Migowskiover 6 years ago3 messages
#1Daniel Migowski
dmigowski@ikoffice.de

Hello,

one of the most frustating things when I started with PostgreSQL was
that IDENTITY columns are based on sequences that are completly
disconnected from the table contents and manually imported data will
lead to errors like 'duplicate key value violates unique constraint
"xyz_pkey"'.

I had to fight this initially with an insert trigger that always updates
the sequences on each insert, or with client side code that updates the
sequence when such an error occurs and then retries the insert.

Even Microsoft Access did a better job at autogenerated primaries keys,
and while I love the elegant design of PostgreSQL in many ways I believe
we can do better here. I would like to implement a fallback solution
that detects such errors and automatically updates the nextvalue of the
sequence when the nextvalue is already used on insert.

I believe this can be implemented without affecting performance
negatively when one just does extra stuff in the error case, so I
wouldn't do table scans when creating the insert initially.

Any reasons why this isn't a good idea to try?

Regards,
Daniel Migowski

#2Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Daniel Migowski (#1)
Re: Proposal: Better generation of values in GENERATED columns.

On 2019-08-26 13:28, Daniel Migowski wrote:

I would like to implement a fallback solution
that detects such errors and automatically updates the nextvalue of the
sequence when the nextvalue is already used on insert.

ISTM that such a system would likely have many of the same problems as
the often-proposed ignore-errors mode for COPY, namely that you can't
catch errors and do something else other than rethrowing the error.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: Proposal: Better generation of values in GENERATED columns.

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2019-08-26 13:28, Daniel Migowski wrote:

I would like to implement a fallback solution
that detects such errors and automatically updates the nextvalue of the
sequence when the nextvalue is already used on insert.

ISTM that such a system would likely have many of the same problems as
the often-proposed ignore-errors mode for COPY, namely that you can't
catch errors and do something else other than rethrowing the error.

In principle you could probably use the same infrastructure used by
ON CONFLICT to detect the unique-key violation. But ON CONFLICT is
mighty complicated, and not very cheap either. I don't for one second
believe Daniel's assertion that this could be done without a
significant COPY performance hit.

I'm also dubious that the right response to a duplicate key would be
as simple as "try the next nextval() result". I for one wouldn't be
satisfied with waiting for COPY to grind through a few thousand/million
sequential nextval values before finding one that doesn't conflict with
the existing table entries.

The actually-sound advice for loading data that might have conflicting
serial values is to do the equivalent of

setval('sequence', max(existing keys) + 1)

before you start loading. I wonder whether there's a way to make that
simpler.

regards, tom lane