INSERT ... ON CONFLICT DO UPDATE

Started by Rafal Pietrakover 10 years ago29 messagesgeneral

rafal@ztk-rp.eu

over 10 years ago

Hello,

I've just started to read through postgres-9.5 "what's new" ... before
giving it a try. The "insert ... on conflict do update" is particularly
atractive to me; but I was wondering why it does not cover the third
usage scenario of action that a programmer may need for a PK conflict
during insert.

In my experience, most often I generate a random value for PK, with that
random value becoming a unique ticket like a voucher (related to
monetary value). for that I:

CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
(random()*1000000000)::bigint, issued date default now(), .....);

Naturally:
1. A_VOUCHER range space is always significantly larger then currently
issued voucher count - so conflicts are rare.
2. with current (as of 9.5) implementation I think I can always "ON
CONFLICT DO NOTHING", and retry the INSERT from application level.

But it would be immenensly more comfortable if one could: "INSERT ... ON
CONFLICT (a_voucher) DO RETRY"; with semantics of that statement being:
1. prepare should check if there is a DFAULT for specified "conflict
column" (here: "a_voucher"), and fail if there isn't one.
2. prepare shoud check if the default is a VOLATILE function... or fail.
3. when all that pass, the prepared insert, when executed and with a
conflict, should be re-attempt with NEW call to that DEFAULT function of
the indicated CONFLICT column(s).
3. and there should be a /ETC/POSTGRES.CONF parameter limiting the
number of retries for a single conflict - as a programmer I know, that
if I need to retry more then twice, the space is too dense, always. So I
need to change the DFAULT function, not increase the retry_count ...
thus haveing DDS allowing the change to the DFAULT FUNCTION means it's
not necesary to allow for change of the RETRY_CONT (during database
life) - and when the later is in the CONFIG, the less it's prone to typo
errors of application authors.

Was the above considered for "ON CONFLICT" implementation before?

If so, can someone pls point me to critics it received.

If not: is it unreasonable? why?

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Charles Clavadetscher

clavadetscher@swisspug.org

over 10 years ago

In reply to: Rafal Pietrak (#1)

Re: INSERT ... ON CONFLICT DO UPDATE

Hello

I've just started to read through postgres-9.5 "what's new" ... before giving it
a try. The "insert ... on conflict do update" is particularly atractive to me; but I
was wondering why it does not cover the third usage scenario of action that a
programmer may need for a PK conflict during insert.

In my experience, most often I generate a random value for PK, with that
random value becoming a unique ticket like a voucher (related to monetary
value). for that I:

CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
(random()*1000000000)::bigint, issued date default now(), .....);

Naturally:
1. A_VOUCHER range space is always significantly larger then currently issued
voucher count - so conflicts are rare.
2. with current (as of 9.5) implementation I think I can always "ON CONFLICT
DO NOTHING", and retry the INSERT from application level.

An UPSERT is "try an INSERT and if there is a conflict, do nothing or UPDATE some values of the existing record". The scenario that you suggest is not an UPSERT, because what you want to reach is to try a new INSERT, hoping that this works.
What speak against using a sequence for the primary key column a_voucher? This would guarantee that you don't have a conflict.

But it would be immenensly more comfortable if one could: "INSERT ... ON
CONFLICT (a_voucher) DO RETRY"; with semantics of that statement being:
1. prepare should check if there is a DFAULT for specified "conflict column"
(here: "a_voucher"), and fail if there isn't one.
2. prepare shoud check if the default is a VOLATILE function... or fail.
3. when all that pass, the prepared insert, when executed and with a conflict,
should be re-attempt with NEW call to that DEFAULT function of the
indicated CONFLICT column(s).
3. and there should be a /ETC/POSTGRES.CONF parameter limiting the
number of retries for a single conflict - as a programmer I know, that if I need
to retry more then twice, the space is too dense, always. So I need to change
the DFAULT function, not increase the retry_count ...
thus haveing DDS allowing the change to the DFAULT FUNCTION means it's
not necesary to allow for change of the RETRY_CONT (during database
life) - and when the later is in the CONFIG, the less it's prone to typo errors of
application authors.

Was the above considered for "ON CONFLICT" implementation before?

If so, can someone pls point me to critics it received.

If not: is it unreasonable? why?

IMHO, as I mentioned, this is not an UPSERT use case, but maybe the implementors of the feature may have different arguments. You could implement that in a function instead of the application, if you prefer.

Bye
Charles

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rafal Pietrak

rafal@ztk-rp.eu

over 10 years ago

In reply to: Charles Clavadetscher (#2)

Re: INSERT ... ON CONFLICT DO UPDATE

Hi,

W dniu 19.07.2015 o 09:33, Charles Clavadetscher pisze:
[---------------]

2. with current (as of 9.5) implementation I think I can always "ON CONFLICT
DO NOTHING", and retry the INSERT from application level.

An UPSERT is "try an INSERT and if there is a conflict, do nothing or UPDATE some values of the existing record". The scenario that you suggest is not an UPSERT, because what you want to reach is to try a new INSERT, hoping that this works.
What speak against using a sequence for the primary key column a_voucher? This would guarantee that you don't have a conflict.

It have to be random, since it barres a "sort of monetary" value. The
vouches are destined to be one-time authorization tokens, they have to
be harder to guess then those drawn from the sequence are.

[------------]

If not: is it unreasonable? why?

IMHO, as I mentioned, this is not an UPSERT use case, but maybe the implementors of the feature may have different arguments. You could implement that in a function instead of the application, if you prefer.

I'm not particularly fond of using functions to accessing RDBMS instead
of tables.

And I'm not particularly fond of "workarounds".

But if that usage scenario is not appreciated here, then guess I have to
live with what is available. And the availability of ON CONFLICT is a
great improvement anyway.

Thenx,

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Rafal Pietrak (#3)

Re: INSERT ... ON CONFLICT DO UPDATE

On 19 July 2015 at 09:11, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

I'm not particularly fond of using functions to accessing RDBMS instead
of tables.

And I'm not particularly fond of "workarounds".

Use a combination of factors (a sequence ID and the key) for your
authorization. So in the extremely unlikely event that the random key isn't
unique, it doesn't matter. That's not a workaround, it's good design.

You're asking for a feature that is completely unnecessary and is easily
resolved. UPSERT is designed for a situation which is neither.

Geoff

Rafal Pietrak

rafal@ztk-rp.eu

over 10 years ago

In reply to: Geoff Winkless (#4)

Re: INSERT ... ON CONFLICT DO UPDATE

Hi,

W dniu 19.07.2015 o 10:27, Geoff Winkless pisze:

On 19 July 2015 at 09:11, Rafal Pietrak <rafal@ztk-rp.eu
<mailto:rafal@ztk-rp.eu>> wrote:

I'm not particularly fond of using functions to accessing RDBMS instead
of tables.

And I'm not particularly fond of "workarounds".

Use a combination of factors (a sequence ID and the key) for your
authorization. So in the extremely unlikely event that the random key
isn't unique, it doesn't matter. That's not a workaround, it's good design.

I'm sory. At this point I don't want to prolong the discussion (like
flaming), but I feel like having to defend myself a little.

Regarding the above: when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

In my current implementations I captures (the rare) PK conflict
exceptions and redo the INSERT at application level. It works
sufficiently well. I was just curious if that usage scenario is
currently also covered by current ON CONFLICT, or not.

You're asking for a feature that is completely unnecessary and is easily
resolved. UPSERT is designed for a situation which is neither.

1. Despite possibly sounding like one, I wasn't actually asking for a
feature - I wasn't sure if that could possibly be implemented using
currently available postresql features. So I've just explained a usage
scenario (explaining the semantics using "invented pseudo syntax") which
I've experienced.

2. that usage scenario, IMHO wasn't obviously covered (as of my first
reading of "the upsert" implementation). It might have been covered ...
only I wasn't seeing it; so I've brought it up.

3. and obviously that usage scenario (despite my personal experience)
might actually be very rare - unworthy implementation effort and thus
qualifying for workarounds. This happen, I understand.

I hope this explains my point better.

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Rafal Pietrak (#5)

Re: INSERT ... ON CONFLICT DO UPDATE

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

I'm sorry that you feel defensive about this, and apologies for repeating
myself, but the fact that the random key can be duplicated means it should
not be used as a primary key, so using a sequence as a primary key is not a
workaround, it's a correction to the design.

Notwithstanding that, the reason UPSERT is required is because it's
possible that two competing transactions can end up fighting over an INSERT
and the workarounds that are required are either highly complex or not 100%
successful (eg
http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).

Conversely, the workaround in the above case (even if you don't want to
change the primary key) is trivial - as you yourself described.

Geoff

Rafal Pietrak

rafal@ztk-rp.eu

over 10 years ago

In reply to: Geoff Winkless (#6)

Re: INSERT ... ON CONFLICT DO UPDATE

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
<mailto:rafal@ztk-rp.eu>> wrote:

when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

I'm sorry that you feel defensive about this, and apologies for
repeating myself, but the fact that the random key can be duplicated
means it should not be used as a primary key, so using a sequence as a
primary key is not a workaround, it's a correction to the design.

OK. I think I need to apology myself, too. I hope my defense wasn't too
fierce.

But I need to clearify one thing:

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
is just a matter of choice here. That shouldn't rise concern. I just use
tools RDBMS provides for "semantics" the application needs.

Notwithstanding that, the reason UPSERT is required is because it's
possible that two competing transactions can end up fighting over an
INSERT and the workarounds that are required are either highly complex
or not 100% successful (eg
http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).

I knew that Depesz publication before.

Actually it was the reason I've brought up "my usage scenario" here now.
I'm not as competent as Depesz, so:

1. I worry, that while restarting a failed INSERT transaction at
application level I miss something important (you people know by heart)
and unwillingly corrupt and/or "suboptimise" my application/data. (much
to the point Depesz described).

2. But, since the majority of the hard work of postgresql UPSERT
implementation is already done; I wanted to check out if the usage
scenario I point out falls into it as a "case", or is covered by it by
some "indiomatic SQL sequence", or otherwise. From current discussion I
gather: "its otherwise" - it isn't considered as applicable. (so I
concluded: I'll live with manual re-attempt of failed insert)

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Adrian Klaver

adrian.klaver@aklaver.com

over 10 years ago

In reply to: Rafal Pietrak (#7)

Re: INSERT ... ON CONFLICT DO UPDATE

On 07/19/2015 06:47 AM, Rafal Pietrak wrote:

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
<mailto:rafal@ztk-rp.eu>> wrote:

when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

I'm sorry that you feel defensive about this, and apologies for
repeating myself, but the fact that the random key can be duplicated
means it should not be used as a primary key, so using a sequence as a
primary key is not a workaround, it's a correction to the design.

OK. I think I need to apology myself, too. I hope my defense wasn't too
fierce.

But I need to clearify one thing:

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

From my perspective the issue is, you are using a 'unique' key
generator that you know is not creating unique keys and then asking the
database to make it right. Sort of like making a square peg fit a round
hole by shaving the corners. It is possible but has sort of a messy feel
to it.

Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
is just a matter of choice here. That shouldn't rise concern. I just use
tools RDBMS provides for "semantics" the application needs.

Notwithstanding that, the reason UPSERT is required is because it's
possible that two competing transactions can end up fighting over an
INSERT and the workarounds that are required are either highly complex
or not 100% successful (eg
http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).

I knew that Depesz publication before.

Actually it was the reason I've brought up "my usage scenario" here now.
I'm not as competent as Depesz, so:

1. I worry, that while restarting a failed INSERT transaction at
application level I miss something important (you people know by heart)
and unwillingly corrupt and/or "suboptimise" my application/data. (much
to the point Depesz described).

2. But, since the majority of the hard work of postgresql UPSERT
implementation is already done; I wanted to check out if the usage
scenario I point out falls into it as a "case", or is covered by it by
some "indiomatic SQL sequence", or otherwise. From current discussion I
gather: "its otherwise" - it isn't considered as applicable. (so I
concluded: I'll live with manual re-attempt of failed insert)

As noted upstream, what you want is not an UPSERT. An UPSERT is based on
the premise that if you try an INSERT where the unique constraint
already exists then the INSERT is turned into an UPDATE.

To be fair:

http://www.postgresql.org/docs/9.5/static/sql-insert.html
"
ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome -
provided there is no independent error, one of those two outcomes is
guaranteed, even under high concurrency. This feature is also known as
UPSERT"

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.

-R

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rafal Pietrak

rafal@ztk-rp.eu

over 10 years ago

In reply to: Adrian Klaver (#8)

Re: INSERT ... ON CONFLICT DO UPDATE

W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:

On 07/19/2015 06:47 AM, Rafal Pietrak wrote:

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu

[---------------]

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

From my perspective the issue is, you are using a 'unique' key generator
that you know is not creating unique keys and then asking the database
to make it right. Sort of like making a square peg fit a round hole by
shaving the corners. It is possible but has sort of a messy feel to it.

Hmmm, yes.

Yet, I don't feel guilty as much, since that is quite similar to a
unique key on database "username", while the "generator" (human mind)
does not guarantee that. The DB just makes sure it does.

[--------------]

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.

Mine too. But I'd say that the above wording exactly makes the point I
was trying to make. Thank you.

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Adrian Klaver

adrian.klaver@aklaver.com

over 10 years ago

In reply to: Rafal Pietrak (#9)

Re: INSERT ... ON CONFLICT DO UPDATE

On 07/19/2015 08:04 AM, Rafal Pietrak wrote:

W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:

On 07/19/2015 06:47 AM, Rafal Pietrak wrote:

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu

[---------------]

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

From my perspective the issue is, you are using a 'unique' key generator
that you know is not creating unique keys and then asking the database
to make it right. Sort of like making a square peg fit a round hole by
shaving the corners. It is possible but has sort of a messy feel to it.

Hmmm, yes.

Yet, I don't feel guilty as much, since that is quite similar to a
unique key on database "username", while the "generator" (human mind)
does not guarantee that. The DB just makes sure it does.

I think the argument to be made here is you have no control over what
people choose as a username, you do have control over what your key
generator outputs.

[--------------]

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.

Mine too. But I'd say that the above wording exactly makes the point I
was trying to make. Thank you.

-R

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Melvin Davidson

melvin6925@gmail.com

over 10 years ago

In reply to: Adrian Klaver (#10)

Re: INSERT ... ON CONFLICT DO UPDATE

Aside from Tom Lane's comments, it seems to me you are reinventing the
wheel by generating random values for keys. Why not just use UUID
http://www.postgresql.org/docs/9.5/static/datatype-uuid.html
or serial
http://www.postgresql.org/docs/9.5/static/datatype-numeric.html#DATATYPE-SERIAL?

Wouldn't that simplify things by insuring uniqueness?

On Sun, Jul 19, 2015 at 11:12 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 07/19/2015 08:04 AM, Rafal Pietrak wrote:

W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:

On 07/19/2015 06:47 AM, Rafal Pietrak wrote:

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu

[---------------]

Although "a random" can duplicate its previous values, "my random(s)"

(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

From my perspective the issue is, you are using a 'unique' key generator
that you know is not creating unique keys and then asking the database
to make it right. Sort of like making a square peg fit a round hole by
shaving the corners. It is possible but has sort of a messy feel to it.

Hmmm, yes.

Yet, I don't feel guilty as much, since that is quite similar to a
unique key on database "username", while the "generator" (human mind)
does not guarantee that. The DB just makes sure it does.

I think the argument to be made here is you have no control over what
people choose as a username, you do have control over what your key
generator outputs.

[--------------]

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.

Mine too. But I'd say that the above wording exactly makes the point I
was trying to make. Thank you.

-R

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#12

Daniel Verite

daniel@manitou-mail.org

over 10 years ago

In reply to: Rafal Pietrak (#1)

Re: INSERT ... ON CONFLICT DO UPDATE

Rafal Pietrak wrote:

CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
(random()*1000000000)::bigint, issued date default now(), .....);

Generators of truly unique pseudo-random values provide a
better ground for this. Consider for example:

https://wiki.postgresql.org/wiki/Pseudo_encrypt

You may replace the round function with your own secret function,
so you'll get the required randomness, secrecy and uniqueness.
No need to deal with collisions on insertion as there are none.

2. with current (as of 9.5) implementation I think I can always "ON
CONFLICT DO NOTHING", and retry the INSERT from application level.

Yes, but retrying is now easy, let's not underappreciate that.

As a test, with 9.5alpha1, I create a table with 100k unique
random numbers:

CREATE TABLE vouchers(id int primary key);

Then try to populate it immediately with 100k rows:

INSERT INTO vouchers
select (random()*1000000000)::int from
generate_series(1,100000)
ON CONFLICT DO NOTHING;

psql result: INSERT 0 99995

Note how 5 values conflicted right from the beginning,
even though we're claiming only 10^5 out of the 10^9 output
range (or 0.01%).
The probability of at least one collision is pretty high,
see the "birthday paradox" for the theory on that.

Anyway the collisions got eliminated without any effort
from me and that's quite useful already.

Now trying to insert 10k rows at a time:

INSERT INTO vouchers
SELECT (random()*1000000000)::int
FROM generate_series(1,10000)
ON CONFLICT DO NOTHING
RETURNING id;

when run repeatedly, it tends to return between 9995 and 10000 values.
If we want exactly N rows and we get back N-epsilon, then we need to
re-ask for epsilon rows, but this will converge fast to completion.
(that is, until you have enough values that the birthday paradox effect
really kicks in).

My point is that we can now achieve that without any exception handling
or transaction retry, and no plpgsql function to create, so it's really
a significant improvement in ease of use. And presumably in
performance too.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Daniel Verite

daniel@manitou-mail.org

over 10 years ago

In reply to: Melvin Davidson (#11)

Re: INSERT ... ON CONFLICT DO UPDATE

Melvin Davidson wrote:

Aside from Tom Lane's comments, it seems to me you are reinventing the wheel
by generating random values for keys. Why not just use UUID
http://www.postgresql.org/docs/9.5/static/datatype-uuid.html
or serial
http://www.postgresql.org/docs/9.5/static/datatype-numeric.html#DATATYPE-SERIAL?
Wouldn't that simplify things by insuring uniqueness?

UUIDs are 36 characters wide; it's too boring and error-prone
for a person to type this on a keyboard or spell it over the phone
to an operator.

For SERIAL, it's too obvious to guess what is the next one,
so malicious people could claim access codes or vouchers
they don't own.

The constraint is that such codes must be reasonably short, but
someone who tries to make up one must have a near-zero chance
of guessing one that actually exists.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14

Francisco Olarte

folarte@peoplecall.com

over 10 years ago

In reply to: Daniel Verite (#13)

Re: INSERT ... ON CONFLICT DO UPDATE

Hi Daniel:

On Sun, Jul 19, 2015 at 9:03 PM, Daniel Verite <daniel@manitou-mail.org> wrote:

For SERIAL, it's too obvious to guess what is the next one,
so malicious people could claim access codes or vouchers
they don't own.

Why don't you use encryption? Specifically only on the external side.
You use a serial in the DB and send the encrypted serial as voucher
code ( this way you do not need to have database resident encryption
). Then when you receive the code in the program you decrypt it and
are done. And having serial in the DB can be good for your internal
operations. Encryption, reversible and colision free, not hashing.

The constraint is that such codes must be reasonably short, but
someone who tries to make up one must have a near-zero chance
of guessing one that actually exists.

If you can live with a little longer voucher ( it seems you use 10^9
in random ), you can use 2^32, which is just 9.5 digits, and search
for a 32 bit block cipher ( or build it yourself, it's not that hard
using stream ciphers or other tools ).

I also thinks random UUIDs are not ok, not because they are long but
because they are random, and can collide ( encrypted serials can not
).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

Deven Phillips

deven.phillips@gmail.com

over 10 years ago

In reply to: Rafal Pietrak (#7)

Re: INSERT ... ON CONFLICT DO UPDATE

For this purpose, I have seen it recommended to use a UUID instead of a
randomly generated integer. I do this myself for production applications
and over millions of records I have yet to log a conflict. Also, as stated
above, you could create a plpgsql function which would achieve exactly what
you want (retry insert until it succeeds).

Just my 2 cents,

Deven

On Sun, Jul 19, 2015 at 9:47 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

Show quoted text

Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:

On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
<mailto:rafal@ztk-rp.eu>> wrote:

when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for

implementation)

is a workaround (almost by definition).

I'm sorry that you feel defensive about this, and apologies for
repeating myself, but the fact that the random key can be duplicated
means it should not be used as a primary key, so using a sequence as a
primary key is not a workaround, it's a correction to the design.

OK. I think I need to apology myself, too. I hope my defense wasn't too
fierce.

But I need to clearify one thing:

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
is just a matter of choice here. That shouldn't rise concern. I just use
tools RDBMS provides for "semantics" the application needs.

Notwithstanding that, the reason UPSERT is required is because it's
possible that two competing transactions can end up fighting over an
INSERT and the workarounds that are required are either highly complex
or not 100% successful (eg
http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).

I knew that Depesz publication before.

Actually it was the reason I've brought up "my usage scenario" here now.
I'm not as competent as Depesz, so:

1. I worry, that while restarting a failed INSERT transaction at
application level I miss something important (you people know by heart)
and unwillingly corrupt and/or "suboptimise" my application/data. (much
to the point Depesz described).

2. But, since the majority of the hard work of postgresql UPSERT
implementation is already done; I wanted to check out if the usage
scenario I point out falls into it as a "case", or is covered by it by
some "indiomatic SQL sequence", or otherwise. From current discussion I
gather: "its otherwise" - it isn't considered as applicable. (so I
concluded: I'll live with manual re-attempt of failed insert)

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16

Rafal Pietrak

rafal@ztk-rp.eu

over 10 years ago

In reply to: Daniel Verite (#12)

Re: INSERT ... ON CONFLICT DO UPDATE

If I'm not mistaken, the conclusions from posts in this thread are:

1. recognizing of a "RETRY" action, as a separate case of "ON CONFLICT"
transaction continuation is not generally appreciated.

2. I shouldn't expect any "hidden corruption/performance" obstacles when
simply re-attempting of an INSERT at the application level (should
constraint conflict arise).

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.

Regarding the last point. Usually, I implement one-time used vouchers as
rows in table like:

CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
null default timestamp_pl_interval(now()::timestamp, '2
min'::interval),..., unique (voucher,consumed) );

with CONSUMED column NULLyfied when voucher is used. The entire row of
consumed voucher is purged after clearence and verification, which
happen significantly later.

Such short lived (when active) voucher is usually just 6-digit long, to
help people enter it.

I don't know much about cryptography, but would a generic encryption
function (like that indicated by Daniel) have the same "waking through
the entire range-space" behavior as the original when that range-space
is externally (by my application) truncated to those 6 digits? If not,
would it be as efficient in conflict avoidance as used with original
32-bit range-space?

Then again. Is it really a "good practice" to rely on a programmer to
peek "proper/correct encryption helper" instead of providing him/her
with a database-integrated tool for a "well defined" and not so rare
usage scenario as "random default" for UNIQUE/PK column?

So my conclusion from this thread is that as this usage scenario does
not seem to be foreseen by current implementation of ON CONFLICT
transaction, a workaround exists (like: cryptographic range-walker).
Being it a workaround, I'd vote for some direct supported of that
scenario in the future at database level.

-R

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Rafal Pietrak (#16)

Re: INSERT ... ON CONFLICT DO UPDATE

On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

If I'm not mistaken, the conclusions from posts in this thread are:

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.

Some web research suggests that random sequences are not great for indexes
because of the resultant "keyspace fragmentation". I'm assuming that means
a low number of nodes in the btree leafs, so an increase in memory usage
for the index?

Just a thought.

Geoff

#18

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Geoff Winkless (#17)

Re: INSERT ... ON CONFLICT DO UPDATE

Geoff Winkless wrote:

On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

If I'm not mistaken, the conclusions from posts in this thread are:

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.

Some web research suggests that random sequences are not great for indexes
because of the resultant "keyspace fragmentation". I'm assuming that means
a low number of nodes in the btree leafs, so an increase in memory usage
for the index?

Not sure what type of indexes would be affected by that problem, but I
don't think Postgres' btrees would be.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19

Igor Neyman

ineyman@perceptron.com

over 10 years ago

In reply to: Alvaro Herrera (#18)

Re: INSERT ... ON CONFLICT DO UPDATE

Geoff Winkless wrote:

On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:

If I'm not mistaken, the conclusions from posts in this thread are:

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.

Some web research suggests that random sequences are not great for
indexes because of the resultant "keyspace fragmentation". I'm
assuming that means a low number of nodes in the btree leafs, so an
increase in memory usage for the index?

Not sure what type of indexes would be affected by that problem, but I don't think Postgres' btrees would be.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

___

Well, there is a caveat.
If I create table and couple indexes like this:

create table test_index_size(c1 int, c2 int, constraint U2 unique (c2));
create index U1 on test_index_size(c1);

and populate them:

insert into test_index_size(c1, c2)
select round(random()*1000000), a from generate_series(1,1000000) a limit 100000;

and then check the size of the indexes:
for "select pg_relation_size('U1')" I get 2834432
while " select pg_relation_size('U2')" returns 2285568.

So, index based on randomly populated column is bigger than the one based on sequentially populated.

But, on the other hand, after:

reindex table test_index_size;

both indexes are of the same size: 2260992.

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Alvaro Herrera (#18)

Re: INSERT ... ON CONFLICT DO UPDATE

On 20 July 2015 at 15:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Not sure what type of indexes would be affected by that problem, but I
don't think Postgres' btrees would be.

I admit it's not really my area.

Take it up with Drew Blas, I guess :)

https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/
(see the addendum near the bottom).

My first stab at comprehending what he means:

Imagine if you have around a thousand entries in your index. In a
sequential set, the btree could be (with each leaf containing 100 values)
499
299 799
100 200 300 400 500 600 700 800 900

In a (16-bit, say) random set, the btree could be
33208
21728 45220
927 15927 21729 29661 33209 34917 40121 45221 49826

Inserting a new value in a sequential key is always going to go into the
900 node. When that fills, you can add a new node and just balance the
parent branches (assuming postgres doesn't try to keep a proper balance
across the nodes too?).

However in the random tree a new value could be inserted into _any_ of the
leaf nodes, which means you either have to create more leaf nodes than you
require (in order to leave plenty of blank space, and even if you do that
you will probably end up with uneven fill, which means some searches will
be slower than others) or you end up having to refactor all of your nodes
every time you insert a value.

Geoff

#21