Guidance on INSERT RETURNING order

Started by Federicoalmost 3 years ago56 messagesgeneral
Jump to latest
#1Federico
cfederico87@gmail.com

Hello list,

I have a few clarification questions regarding using insert with returning.
The use case is SQLAlchemy development, where the orm wants to insert
a list of rows,
get back the generated ids, defaults, etc, and match the returned values with
the original list of orm objects.

The following assumes a table like this

CREATE TABLE t(
id SERIAL,
data TEXT -- type here can be anything
)

On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id

but we were recently made aware that there is no guarantee on the
order of the returned columns.
Looking at the documentation there is no mention of the order of the
RETURNING clause, but searching
past discussion there are multiple indication that the order is not
guaranteed, like
/messages/by-id/19445.1350482182@sss.pgh.pa.us
. I think the docs
should mention this, similar to what the sqlite docs do at
https://www.sqlite.org/lang_returning.html#limitations_and_caveats

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

to ensure that the id are created in the order specified by num. The
returned id can again be in
arbitrary order, but sorting them should enable correctly matching the
orm object so that they can
be properly updated.
Is this correct?
The documentation does not say anything about this, and looking at the
archive it seems that
it's mostly correct but not 100% guaranteed, as stated here
/messages/by-id/29386.1528813619@sss.pgh.pa.us .
The MSSQL docs, for example, clearly state that this is the case
https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16#limitations-and-restrictions
,
so it would be helpful if something similar were mentioned in the
PostgreSQL docs.

The above insert form (INSERT SELECT ORDER BY) can be used when the
primary key is an auto incrementing value,
in case it isn't (such as when it's an UUID), another solution must be used.
Since there does not seem to be any way of getting the position of the
original row inside
the VALUES clause with RETURNING, the solution SQLAlchemy is
implementing is to either degrade to
inserts with a single value or to optionally allow the user to add a
"sentinel" column to the table,
so that a sequential value can be inserted into it and then returned
allowing the ordering of the
RETURNING clause rows:

ALTER TABLE t ADD COLUMN sentinel SMALLINT

INSERT INTO t(data, sentinel) VALUES ('a', 1), ('b', 2), ('c', 3)
RETURNING id, sentinel

Is there any better solution to achieve this? (For reference this
feature is tracked in SQLAlchemy by
https://github.com/sqlalchemy/sqlalchemy/issues/9618)

From an ORM standpoint it would be very useful having a way of forcing
the order of RETURNING
to be the same as the one in VALUES, maybe with an additional keyword.
Alternatively having a system column
or other function that can be placed into the returning clause to
return the output row position wrt the
input values list, similar to what the sentinel column above does.

At the very least I think the documentation could do a better job at
mentioning that RETURNING order is
arbitrary, and documenting that INSERT SELECT ORDER BY precesses the
default in select order
(if that's indeed the case)

Sorry for the long email,
Thanks

Federico

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Federico (#1)
Re: Guidance on INSERT RETURNING order

On 4/11/23 12:47, Federico wrote:

Hello list,

I have a few clarification questions regarding using insert with returning.
The use case is SQLAlchemy development, where the orm wants to insert
a list of rows,
get back the generated ids, defaults, etc, and match the returned values with
the original list of orm objects.

The following assumes a table like this

CREATE TABLE t(
id SERIAL,
data TEXT -- type here can be anything
)

On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id

but we were recently made aware that there is no guarantee on the
order of the returned columns.

1) Because returned data in SQL is inherently unordered.

2) What would you order by, id or data or both?

Sorry for the long email,
Thanks

Federico

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Federico (#1)
Re: Guidance on INSERT RETURNING order

On 4/11/23 12:47, Federico wrote:

Hello list,

https://www.sqlite.org/lang_returning.html#limitations_and_caveats

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

Or

with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
returning id)
select i.id from i order by id;

Sorry for the long email,
Thanks

Federico

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Achilleas Mantzios
a.mantzios@cloud.gatewaynet.com
In reply to: Adrian Klaver (#3)
Re: Guidance on INSERT RETURNING order

Στις 11/4/23 23:06, ο/η Adrian Klaver έγραψε:

On 4/11/23 12:47, Federico wrote:

Hello list,

https://www.sqlite.org/lang_returning.html#limitations_and_caveats

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

     INSERT INTO t(data)
     SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
     RETURNING id

Or

with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
returning id)
select i.id from i order by id;

+1 for this version!

Sorry for the long email,
Thanks

     Federico

--
Achilleas Mantzios
IT DEV - HEAD
IT DEPT
Dynacom Tankers Mgmt

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Federico (#1)
Re: Guidance on INSERT RETURNING order

Federico <cfederico87@gmail.com> writes:

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

to ensure that the id are created in the order specified by num. The
returned id can again be in
arbitrary order, but sorting them should enable correctly matching the
orm object so that they can
be properly updated.
Is this correct?

No. Sadly, adding that ORDER BY is just voodoo programming, because
it applies to the result of the SELECT while promising nothing about
the order in which INSERT/RETURNING will act on those rows.

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

I think if you want to use RETURNING with multi-row inserts, the
thing to do is more like

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id

and then explicitly match up the returned "data" values rather than
presuming they appear in the same order you wrote them in in VALUES.
Admittedly this might be problematic if some of the VALUES rows
are identical, but how much should you care?

regards, tom lane

#6Federico
cfederico87@gmail.com
In reply to: Adrian Klaver (#3)
Re: Guidance on INSERT RETURNING order

Thanks for the ansers

2) What would you order by, id or data or both?

by values order, (that incidentally seems to be what PG does)

with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
returning id)
select i.id from i order by id;

The problem here is not having the auto increment id in a particular
order, is that there
is apparently no correlation with the position of an element in the
values clause with the
id generated. That's the reason for using the sentinel column in the
general solution in the previous message.

The extend on the use case, SQLAlchemy has 3 objects T that have
T(data='a'), T(data='b'), T(data='c') but no
value for the id column. The objective is to insert the 3 data values,
get back the ids and correctly match them with
the correct 3 objects.

No. Sadly, adding that ORDER BY is just voodoo programming, because
it applies to the result of the SELECT while promising nothing about
the order in which INSERT/RETURNING will act on those rows.

I wasn't probably clear, it's fine if INSERT/RETURNING order is
arbitrary, what matters is that the
autoincementing values is executed in the same order as select, like
mentioned in this
previous message
/messages/by-id/29386.1528813619@sss.pgh.pa.us

Is that not the case?

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

I think it's fine not promising anything about the order of RETURNING, but
it would be very helpful having a way of tracking what input row
generated a particular
output row. Basically the sentinel case in the original post,
without actually having to insert the sentinel into the table.

I think if you want to use RETURNING with multi-row inserts, the
thing to do is more like

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id

and then explicitly match up the returned "data" values rather than
presuming they appear in the same order you wrote them in in VALUES.
Admittedly this might be problematic if some of the VALUES rows
are identical, but how much should you care?

Well, the example is very easy, but it's hard to generalize when
inserting multiple columns
with possible complex values in them, since it would mean matching on
possibly large json values,
arrays, etc. So definitely not ideal

Thanks,
Federico

Show quoted text

On Tue, 11 Apr 2023 at 22:06, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 4/11/23 12:47, Federico wrote:

Hello list,

https://www.sqlite.org/lang_returning.html#limitations_and_caveats

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

Or

with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
returning id)
select i.id from i order by id;

Sorry for the long email,
Thanks

Federico

--
Adrian Klaver
adrian.klaver@aklaver.com

#7Thorsten Glaser
tg@evolvis.org
In reply to: Federico (#6)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023, Federico wrote:

The problem here is not having the auto increment id in a particular

The id might not even be auto-increment but UUID or something…
(I am surprised you would even try to insert multiple rows at once.)

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#8Mike Bayer
mike_mp@zzzcomputing.com
In reply to: Tom Lane (#5)
Re: Guidance on INSERT RETURNING order

On Tue, Apr 11, 2023, at 4:22 PM, Tom Lane wrote:

Federico <cfederico87@gmail.com> writes:

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

to ensure that the id are created in the order specified by num. The
returned id can again be in
arbitrary order, but sorting them should enable correctly matching the
orm object so that they can
be properly updated.
Is this correct?

No. Sadly, adding that ORDER BY is just voodoo programming, because
it applies to the result of the SELECT while promising nothing about
the order in which INSERT/RETURNING will act on those rows.

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

if I can state this without having RETURNING getting in the way, because we know RETURNING is not ordered.

Start with this table:

CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
a INT,
b INT
)

Then insert two rows, where id SERIAL fires implicitly, assume the next value the sequence will give us is N1, and then the value after that is N2. It doesn't matter what N1 and N2 are (don't need to be consecutive) but we want N2 > N1, that is, increasing.

INSERT INTO mytable (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num

Then SELECT with ORDER BY:

SELECT id, a, b FROM mytable ORDER BY id

We want the results to be:

(N1, 10, 11)
(N2, 12, 13)

and we dont want them to *ever* be:

(N1, 12, 13)
(N2, 10, 11)

that is, we want the SERIAL column (or an IDENTITY also) to be lined up with the VALUES.

From what you wrote in /messages/by-id/29386.1528813619@sss.pgh.pa.us , that seems to be exactly what you've stated, where this statement:

INSERT INTO table (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num

is organized by the query planner to essentially be equivalent to this, where the nextval() is part of the SELECTed data:

INSERT INTO mytable (id, a, b) SELECT nextval('mytable_id_seq'), p1, p2 FROM
(SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num) as _x

in practice, we add "RETURNING id" and expect those "id" cols to be in increasing order, so we sort to match it up with the input rows.

I think if you want to use RETURNING with multi-row inserts, the
thing to do is more like

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id

and then explicitly match up the returned "data" values rather than
presuming they appear in the same order you wrote them in in VALUES.

we're going to do that also when the table has something like a uuid for a primary key or otherwise.

Admittedly this might be problematic if some of the VALUES rows
are identical, but how much should you care?

we only do any of this if the rows have something unique in them we can hook onto.

regards, tom lane

thanks so much for replying!

#9Federico
cfederico87@gmail.com
In reply to: Thorsten Glaser (#7)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser <tg@evolvis.org> wrote:

On Tue, 11 Apr 2023, Federico wrote:

The problem here is not having the auto increment id in a particular

The id might not even be auto-increment but UUID or something…
(I am surprised you would even try to insert multiple rows at once.)

Well the documentation makes no mention of any limitation on returning
and the observed behaviour has consistently been that returning is in
values order.
Again, that was SQLAlchemy's fault for assuming this (but the docs
surely did not help).

Also re-reading my reply, I've made a typo there, sorry. What it
should have read is:
The problem here is not having the returned ids in a particular
order, is that there is apparently no correlation with the position of
an element in the values clause with the id generated.

Of course sorting the returned ids is only viable when using a serial
or identity column, that's why in the general case I've mentioned the
insert with sentinel column to ask if there are better or alternative
solutions.

Thanks for the reply, best
Federico

Show quoted text

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#10Thorsten Glaser
tg@evolvis.org
In reply to: Federico (#9)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023, Federico wrote:

Of course sorting the returned ids is only viable when using a serial

Yes, which is why I pointed out it doesn’t have to be.

or identity column, that's why in the general case I've mentioned the
insert with sentinel column

But it was pointed out that that’s not guaranteed either, unless you add
that sentinel column to the table itself…

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#11Rob Sargent
robjsargent@gmail.com
In reply to: Federico (#6)
Re: Guidance on INSERT RETURNING order

On 4/11/23 14:37, Federico wrote:

The problem here is not having the auto increment id in a particular
order, is that there
is apparently no correlation with the position of an element in the
values clause with the
id generated. That's the reason for using the sentinel column in the
general solution in the previous message.

The extend on the use case, SQLAlchemy has 3 objects T that have
T(data='a'), T(data='b'), T(data='c') but no
value for the id column. The objective is to insert the 3 data values,
get back the ids and correctly match them with
the correct 3 objects.

No. Sadly, adding that ORDER BY is just voodoo programming, because
it applies to the result of the SELECT while promising nothing about
the order in which INSERT/RETURNING will act on those rows.

I wasn't probably clear, it's fine if INSERT/RETURNING order is
arbitrary, what matters is that the
autoincementing values is executed in the same order as select, like
mentioned in this
previous message
/messages/by-id/29386.1528813619@sss.pgh.pa.us

Is that not the case?

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

I think it's fine not promising anything about the order of RETURNING, but
it would be very helpful having a way of tracking what input row
generated a particular
output row. Basically the sentinel case in the original post,
without actually having to insert the sentinel into the table.

I think if you want to use RETURNING with multi-row inserts, the
thing to do is more like

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id

and then explicitly match up the returned "data" values rather than
presuming they appear in the same order you wrote them in in VALUES.
Admittedly this might be problematic if some of the VALUES rows
are identical, but how much should you care?

Well, the example is very easy, but it's hard to generalize when
inserting multiple columns
with possible complex values in them, since it would mean matching on
possibly large json values,
arrays, etc. So definitely not ideal

Thanks,
Federico

Can your client retain a hashmap of md5,data pairings, allowing the
lookup on the way back using the returned data and supplied id?

#12Federico
cfederico87@gmail.com
In reply to: Thorsten Glaser (#10)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023 at 23:22, Thorsten Glaser <tg@evolvis.org> wrote:

On Tue, 11 Apr 2023, Federico wrote:

Of course sorting the returned ids is only viable when using a serial

Yes, which is why I pointed out it doesn’t have to be.

or identity column, that's why in the general case I've mentioned the
insert with sentinel column

But it was pointed out that that’s not guaranteed either, unless you add
that sentinel column to the table itself…

I was under the impression that when using INSERT SELECT ORDER BY the sequence
ids were generated using the select order.
That has been mentioned in multiple other previous messages, like
/messages/by-id/29386.1528813619@sss.pgh.pa.us

The above does not cover all cases, but in practice serial or identity
are very common,
so it would be nice if at least in these cases a sentinel is not needed

Thanks,
Federico

Show quoted text

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#13Federico
cfederico87@gmail.com
In reply to: Rob Sargent (#11)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023 at 23:31, Rob Sargent <robjsargent@gmail.com> wrote:

On 4/11/23 14:37, Federico wrote:

The problem here is not having the auto increment id in a particular
order, is that there
is apparently no correlation with the position of an element in the
values clause with the
id generated. That's the reason for using the sentinel column in the
general solution in the previous message.

The extend on the use case, SQLAlchemy has 3 objects T that have
T(data='a'), T(data='b'), T(data='c') but no
value for the id column. The objective is to insert the 3 data values,
get back the ids and correctly match them with
the correct 3 objects.

No. Sadly, adding that ORDER BY is just voodoo programming, because
it applies to the result of the SELECT while promising nothing about
the order in which INSERT/RETURNING will act on those rows.

I wasn't probably clear, it's fine if INSERT/RETURNING order is
arbitrary, what matters is that the
autoincementing values is executed in the same order as select, like
mentioned in this
previous message
/messages/by-id/29386.1528813619@sss.pgh.pa.us

Is that not the case?

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

I think it's fine not promising anything about the order of RETURNING, but
it would be very helpful having a way of tracking what input row
generated a particular
output row. Basically the sentinel case in the original post,
without actually having to insert the sentinel into the table.

I think if you want to use RETURNING with multi-row inserts, the
thing to do is more like

INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id

and then explicitly match up the returned "data" values rather than
presuming they appear in the same order you wrote them in in VALUES.
Admittedly this might be problematic if some of the VALUES rows
are identical, but how much should you care?

Well, the example is very easy, but it's hard to generalize when
inserting multiple columns
with possible complex values in them, since it would mean matching on
possibly large json values,
arrays, etc. So definitely not ideal

Thanks,
Federico

Can your client retain a hashmap of md5,data pairings, allowing the
lookup on the way back using the returned data and supplied id?

When using unique columns or similar, that's something that is done,
but if there are no unique columns in the value no match can be done
reliably with the source data, since sqlalchemy is a library that
allows arbitrary schemas to be generated.

Thanks for the reply,
Federico

#14Thorsten Glaser
tg@evolvis.org
In reply to: Federico (#12)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023, Federico wrote:

I was under the impression that when using INSERT SELECT ORDER BY the sequence
ids were generated using the select order.

But someone said that’s not guaranteed, especially when INSERT will
be parallelised later.

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#15Rob Sargent
robjsargent@gmail.com
In reply to: Federico (#13)
Re: Guidance on INSERT RETURNING order

Can your client retain a hashmap of md5,data pairings, allowing the
lookup on the way back using the returned data and supplied id?

When using unique columns or similar, that's something that is done,
but if there are no unique columns in the value no match can be done
reliably with the source data, since sqlalchemy is a library that
allows arbitrary schemas to be generated.

Thanks for the reply,
Federico

So you're returned data is not what was sent to the server? Otherwise it
should generate the same md5, as I understand it. Identical data would
of course be a problem.

#16Federico
cfederico87@gmail.com
In reply to: Thorsten Glaser (#14)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023 at 23:44, Thorsten Glaser <tg@evolvis.org> wrote:

On Tue, 11 Apr 2023, Federico wrote:

I was under the impression that when using INSERT SELECT ORDER BY the sequence
ids were generated using the select order.

But someone said that’s not guaranteed, especially when INSERT will
be parallelised later.

It was Tom Lane's message that said

Re-reading that 2012 thread, the main new observation I'd make today
is that parallel operation is a thing now, and it's not hard to foresee
that sometime soon we'll want to parallelize INSERTs. Which'd make it
*really* hard to promise anything about the order of RETURNING output.

My reading of it is that we are talking about RETURNING, not about the order
in which the serial ids are generated. My understanding was that they
are generated
in select order, then the rows are inserted in any arbitrary order the
planner may choose
and returned again in any arbitrary order.

If my understanding is incorrect, would this alternative guarantee the above
(that nextval is called in the order set by ORDER BY),
again re-using the table in the original message?

INSERT INTO t(id, data)
SELECT nextval(pg_get_serial_sequence('t', 'id')) data
FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
ORDER BY num
RETURNING id

best,
Federico

Show quoted text

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#17Federico
cfederico87@gmail.com
In reply to: Rob Sargent (#15)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023 at 23:46, Rob Sargent <robjsargent@gmail.com> wrote:

Can your client retain a hashmap of md5,data pairings, allowing the
lookup on the way back using the returned data and supplied id?

When using unique columns or similar, that's something that is done,
but if there are no unique columns in the value no match can be done
reliably with the source data, since sqlalchemy is a library that
allows arbitrary schemas to be generated.

Thanks for the reply,
Federico

So you're returned data is not what was sent to the server? Otherwise it
should generate the same md5, as I understand it. Identical data would
of course be a problem.

That should be the case, yes.
If a table has a non-nullable unique key, it should be possible to use
a hashmap and perform that lockup. We are planning on implementing
something like this to cover the cases where it can be used.

Thanks for the reply,
Federico

#18Thorsten Glaser
tg@evolvis.org
In reply to: Federico (#16)
Re: Guidance on INSERT RETURNING order

On Tue, 11 Apr 2023, Federico wrote:

My understanding was that they are generated in select order

But are they? (I don’t know, but I’d not assume that.)

If my understanding is incorrect, would this alternative guarantee the above

INSERT INTO t(id, data)
SELECT nextval(pg_get_serial_sequence('t', 'id')) data
FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
ORDER BY num
RETURNING id

Wouldn’t, at that point, it be better to just send multiple
individual INSERT statements? The overhead (on both sides)
for all mentioned… workarounds… surely is larger than that?

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#19Mike Bayer
mike_mp@zzzcomputing.com
In reply to: Federico (#9)
Re: Guidance on INSERT RETURNING order

I'm not getting every reply in the list but I want to point this out from the archive version of this thread:

I was under the impression that when using INSERT SELECT ORDER BY the sequence
ids were generated using the select order.

But someone said that’s not guaranteed, especially when INSERT will
be parallelised later.

this should not matter. as in my other message if INSERT INTO table SELECT a, b FROM (VALUES () () ()) ORDER BY... takes the SERIAL or IDENTITY column of the table, and creates a parse tree for that looks like INSERT INTO table (SELECT nextval(table.id), a, b FROM (SELECT a, b FROM (VALUES () () ()) ORDER BY), the INSERT can put the rows in any way it wants. We don't care what INSERT does, we care about the generated sequence value, the nextval(table.id) part, which the SELECT should be emitting in order and occurs outside of the purview of the INSERT, according to other emails I have read on these lists (see my previous post).

Show quoted text

On Tue, Apr 11, 2023, at 5:07 PM, Federico wrote:

On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser <tg@evolvis.org> wrote:

On Tue, 11 Apr 2023, Federico wrote:

The problem here is not having the auto increment id in a particular

The id might not even be auto-increment but UUID or something…
(I am surprised you would even try to insert multiple rows at once.)

Well the documentation makes no mention of any limitation on returning
and the observed behaviour has consistently been that returning is in
values order.
Again, that was SQLAlchemy's fault for assuming this (but the docs
surely did not help).

Also re-reading my reply, I've made a typo there, sorry. What it
should have read is:
The problem here is not having the returned ids in a particular
order, is that there is apparently no correlation with the position of
an element in the values clause with the id generated.

Of course sorting the returned ids is only viable when using a serial
or identity column, that's why in the general case I've mentioned the
insert with sentinel column to ask if there are better or alternative
solutions.

Thanks for the reply, best
Federico

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#20Federico
cfederico87@gmail.com
In reply to: Thorsten Glaser (#18)
Re: Guidance on INSERT RETURNING order

On Wed, 12 Apr 2023 at 11:46, Thorsten Glaser <tg@evolvis.org> wrote:

On Tue, 11 Apr 2023, Federico wrote:

My understanding was that they are generated in select order

But are they? (I don’t know, but I’d not assume that.)

That's kind of the point for this question, to see if that's correct or not.

If my understanding is incorrect, would this alternative guarantee the above

INSERT INTO t(id, data)
SELECT nextval(pg_get_serial_sequence('t', 'id')) data
FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
ORDER BY num
RETURNING id

Wouldn’t, at that point, it be better to just send multiple
individual INSERT statements? The overhead (on both sides)
for all mentioned… workarounds… surely is larger than that?

No, not by a long shot. Sending thousands of single inserts
sequentially over the network requires a lot more time even when doing
that on localhost.
Using a single statement is many times faster.

Federico

Show quoted text

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#21John Howroyd
jdhowroyd@googlemail.com
In reply to: Federico (#20)
#22Kirk Wolak
wolakk@gmail.com
In reply to: Federico (#6)
#23Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Kirk Wolak (#22)
#24Federico
cfederico87@gmail.com
In reply to: Adrian Klaver (#23)
#25Kirk Wolak
wolakk@gmail.com
In reply to: Adrian Klaver (#23)
#26Thorsten Glaser
tg@evolvis.org
In reply to: Kirk Wolak (#25)
#27Rob Sargent
robjsargent@gmail.com
In reply to: Thorsten Glaser (#26)
#28Mike Bayer
mike_mp@zzzcomputing.com
In reply to: Thorsten Glaser (#26)
#29Thorsten Glaser
tg@evolvis.org
In reply to: Mike Bayer (#28)
#30Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Thorsten Glaser (#29)
#31Mike Bayer
mike_mp@zzzcomputing.com
In reply to: Thorsten Glaser (#29)
#32John Howroyd
jdhowroyd@googlemail.com
In reply to: Mike Bayer (#31)
#33John Howroyd
jdhowroyd@googlemail.com
In reply to: John Howroyd (#32)
#34Mike Bayer
mike_mp@zzzcomputing.com
In reply to: John Howroyd (#33)
#35John Howroyd
jdhowroyd@googlemail.com
In reply to: Mike Bayer (#34)
#36Mike Bayer
mike_mp@zzzcomputing.com
In reply to: John Howroyd (#35)
#37Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Rob Sargent (#15)
#38John Howroyd
jdhowroyd@googlemail.com
In reply to: Peter J. Holzer (#37)
#39David G. Johnston
david.g.johnston@gmail.com
In reply to: John Howroyd (#38)
#40Federico
cfederico87@gmail.com
In reply to: David G. Johnston (#39)
#41John Howroyd
jdhowroyd@googlemail.com
In reply to: David G. Johnston (#39)
#42David G. Johnston
david.g.johnston@gmail.com
In reply to: John Howroyd (#41)
#43David G. Johnston
david.g.johnston@gmail.com
In reply to: Federico (#40)
#44Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: John Howroyd (#41)
#45Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Karsten Hilbert (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Federico (#40)
#47David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#46)
#48Federico
cfederico87@gmail.com
In reply to: Tom Lane (#46)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Federico (#48)
#50Federico
cfederico87@gmail.com
In reply to: Tom Lane (#49)
#51Mike Bayer
mike_mp@zzzcomputing.com
In reply to: Tom Lane (#46)
#52John Howroyd
jdhowroyd@googlemail.com
In reply to: Mike Bayer (#51)
#53John Howroyd
jdhowroyd@googlemail.com
In reply to: John Howroyd (#52)
#54Federico
cfederico87@gmail.com
In reply to: John Howroyd (#52)
#55John Howroyd
jdhowroyd@googlemail.com
In reply to: Federico (#54)
#56Federico
cfederico87@gmail.com
In reply to: John Howroyd (#55)