[PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
It has been a fact that the RETURNING clause on an INSERT will return
multiple rows with the same order as multiple VALUES have been fed.
eg: INSERT INTO tbl1(code) VALUES ('abc'), ('def'), ('agh')
RETURNING id, code;
is expected to yield:
id | code
-----------
1 | abc
2 | def
3 | agh
Clarify that in the documentation, and also write a test case that will
prevent us from breaking the rule in the future.
---
doc/src/sgml/ref/insert.sgml | 17 +++++++++++++++++
src/test/regress/expected/insert.out | 9 +++++++++
src/test/regress/sql/insert.sql | 4 ++++
3 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index a3930be..64cb41b 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -213,6 +213,11 @@ INSERT <replaceable>oid</replaceable> <replaceable class="parameter">count</repl
<literal>RETURNING</> list, computed over the row(s) inserted by the
command.
</para>
+ <para>
+ If multiple rows are inserted by an <literal>INSERT ... RETURNING</> commmand,
+ the order of the <literal>RETURNING</> rows is the same as that of the inputs
+ to the <command>INSERT</> command.
+ </para>
</refsect1>
<refsect1>
@@ -268,6 +273,18 @@ INSERT INTO films (code, title, did, date_prod, kind) VALUES
</para>
<para>
+ This example inserts multiple rows and returns the corresponding ids
+ at the same order:
+
+<programlisting>
+INSERT INTO films(code, title) VALUES
+ ('B6717', 'Tampopo'),
+ ('HG120', 'The Dinner Game')
+ RETURNING id, code;
+</programlisting>
+ </para>
+
+ <para>
This example inserts some rows into table
<literal>films</literal> from a table <literal>tmp_films</literal>
with the same column layout as <literal>films</literal>:
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 96c7f9e..081e4b9 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -80,4 +80,13 @@ select col1, col2, char_length(col3) from inserttest;
30 | 50 | 10000
(8 rows)
+--- RETURNING order
+insert into inserttest(col1, col2) values(50, 10), (60, 8), (70, 23) RETURNING col2;
+ col2
+------
+ 10
+ 8
+ 23
+(3 rows)
+
drop table inserttest;
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a0ae850..c7815dd 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -35,4 +35,8 @@ insert into inserttest values(30, 50, repeat('x', 10000));
select col1, col2, char_length(col3) from inserttest;
+--- RETURNING order
+
+insert into inserttest(col1, col2) values(50, 10), (60, 8), (70, 23) RETURNING col2;
+
drop table inserttest;
--
1.7.4.4
On Wed, Oct 17, 2012 at 7:38 AM, P. Christeas <xrg@linux.gr> wrote:
It has been a fact that the RETURNING clause on an INSERT will return
multiple rows with the same order as multiple VALUES have been fed.
Is that defined in the standard?
merlin
"P. Christeas" <xrg@linux.gr> writes:
It has been a fact that the RETURNING clause on an INSERT will return
multiple rows with the same order as multiple VALUES have been fed.
eg: INSERT INTO tbl1(code) VALUES ('abc'), ('def'), ('agh')
RETURNING id, code;
is expected to yield:
id | code
-----------
1 | abc
2 | def
3 | agh
Clarify that in the documentation, and also write a test case that will
prevent us from breaking the rule in the future.
I don't believe this is a good idea in the slightest. Yeah, the current
implementation happens to act like that, but there is no reason that we
should make it guaranteed behavior. Nor is a regression test case going
to stop someone from changing it, anyway.
regards, tom lane
On 17 October 2012 14:53, Merlin Moncure <mmoncure@gmail.com> wrote:
Is that defined in the standard?
RETURNING isn't even defined in the standard.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
On Wednesday 17 October 2012, you wrote:
"P. Christeas" <xrg@linux.gr> writes:
It has been a fact that the RETURNING clause on an INSERT will return
multiple rows with the same order as multiple VALUES have been fed.I don't believe this is a good idea in the slightest. Yeah, the current
implementation happens to act like that, but there is no reason that we
should make it guaranteed behavior.
That's my point, to push you to decide on that "feature" and clarify it in the
documentation.
So far, it's very tempting for me to use this behavior, since I can avoid
multiple INSERTs (=save bandwidth) and also the burden of figuring out which of
the returned ids associates to which inserted row.
Having a discussion (or argument or a vote) like this, I think, is useful.
FYI, there is also a stack overflow question on this:
http://stackoverflow.com/questions/5439293/is-insert-returning-guaranteed-to-
return-things-in-the-right-order
--
Say NO to spam and viruses. Stop using Microsoft Windows!
On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
On 17 October 2012 14:53, Merlin Moncure <mmoncure@gmail.com> wrote:
Is that defined in the standard?
RETURNING isn't even defined in the standard.
Right: Point being, assumptions based on implementation ordering are
generally to be avoided unless they are explicitly defined in the
standard or elsewhere.
merlin
A Dimecres, 17 d'octubre de 2012 19:13:47, Merlin Moncure va escriure:
On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter@2ndquadrant.com>
wrote:
On 17 October 2012 14:53, Merlin Moncure <mmoncure@gmail.com> wrote:
Is that defined in the standard?
RETURNING isn't even defined in the standard.
Right: Point being, assumptions based on implementation ordering are
generally to be avoided unless they are explicitly defined in the
standard or elsewhere.
I don't see how one could use RETURNING if result is not ensured to be in the
same order as the tuples supplied. What's the use of RETURNING supplying data
in random order?
--
Albert Cervera i Areny
http://www.NaN-tic.com
Tel: +34 93 553 18 03
2012/10/21 Albert Cervera i Areny <albert@nan-tic.com>:
A Dimecres, 17 d'octubre de 2012 19:13:47, Merlin Moncure va escriure:
On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter@2ndquadrant.com>
wrote:On 17 October 2012 14:53, Merlin Moncure <mmoncure@gmail.com> wrote:
Is that defined in the standard?
RETURNING isn't even defined in the standard.
Right: Point being, assumptions based on implementation ordering are
generally to be avoided unless they are explicitly defined in the
standard or elsewhere.
I don't see how one could use RETURNING if result is not ensured to be in
the same order as the tuples supplied. What's the use of RETURNING supplying
data in random order?
you don't need a ORDER, you need data - and if you need a order, then
you can use CTE and ORDER BY clause.
Proposed feature can be too limited in future - when some better
partitioning can be used or when paralel query processing will be
supported
Pavel
Show quoted text
--
Albert Cervera i Areny
Tel: +34 93 553 18 03
At 2012-10-17 09:56:22 -0400, tgl@sss.pgh.pa.us wrote:
Clarify that in the documentation, and also write a test case
that will prevent us from breaking the rule in the future.I don't believe this is a good idea in the slightest. Yeah, the
current implementation happens to act like that, but there is no
reason that we should make it guaranteed behavior.
I always thought it *was* guaranteed, and I've encountered code written
by other people who were obviously under the same impression: take some
strings (e.g. flag names), use "insert … returning id", map the ids back
to the names, and use the values in further inserts into other tables
("flag_id foreign key references flags").
I know one could say "returning id, name", but there's certainly code
out there that doesn't do this.
I personally think the return order should be guaranteed; and if not,
then the documentation urgently needs some prominent warnings to tell
people that they should not assume this (for any variant of RETURNING).
-- Abhijit
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Abhijit Menon-Sen
Sent: Sunday, October 21, 2012 5:45 AM
To: Tom Lane
Cc: P. Christeas; pgsql-hackers@postgresql.org
Subject: [HACKERS] Re: [PATCH] Enforce that INSERT...RETURNING preserves
the order of multi rowsAt 2012-10-17 09:56:22 -0400, tgl@sss.pgh.pa.us wrote:
Clarify that in the documentation, and also write a test case that
will prevent us from breaking the rule in the future.I don't believe this is a good idea in the slightest. Yeah, the
current implementation happens to act like that, but there is no
reason that we should make it guaranteed behavior.I always thought it *was* guaranteed, and I've encountered code written by
other people who were obviously under the same impression: take some
strings (e.g. flag names), use "insert … returning id", map the ids back to the
names, and use the values in further inserts into other tables ("flag_id
foreign key references flags").I know one could say "returning id, name", but there's certainly code out
there that doesn't do this.I personally think the return order should be guaranteed; and if not, then the
documentation urgently needs some prominent warnings to tell people that
they should not assume this (for any variant of RETURNING).-- Abhijit
Order is never guaranteed unless an ORDER BY clause is involved in processing the data immediately prior to its use.
I could see this being in a "Rules that you must always remember" listing but to include it in every location where people might be inclined to rely upon ordering is just going to clutter the documentation.
That said, I'm not personally opposed to this documentation suggestion. But while the idea is acceptable the actual changes proposed by someone's patch is what needs to be approved and applied.
As to the order of RETURNING I do not see an overly compelling reason to enforce such a limitation; and in general implicit guarantees like this are undesirable since there is no way to turn them off. For sorting in particular the action itself can be expensive and not always needed. While we are not talking strictly sorting here (just maintained order) the concept still applies.
David J.
I agree that it seems inappropriate to preserve order. That seems an
inappropriate imposition, inconsistent with what SQL does elsewhere.
If there is a natural sequence (e.g. - a value assigned by nextval()), that
offers a natural place to apply the usual order-imposing ORDER BY that we
are expected to use elsewhere.
I suppose it is troublesome if there is no such natural sequence, but I
wouldn't think it too meaningful to expect order without some visible
source of order.
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.
Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
-- Abhijit
On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
cheers
andrew
On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
Personally I find that a not very practical suggestion. It means you need the
ability to sort the data equivalently on the clientside which isn't always
easy if you consider platform/locale and whatever differences.
Suggesting nextval() doesn't strike me as very practical either because it
means that you either need a separate roundtrip to the server to get a bunch
of new ids which you then can assign to the to-be-inserted rows or you need
the ability to match the returned rows to the inserted rows somehow. Thats not
always easy.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sunday 21 October 2012, Abhijit Menon-Sen wrote:
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
Exactly. And IMHO it should never have.
The real trouble is when you insert some arbitrary values, which have no
implicit order or primary key /before/ the insert will assign them one. Then,
you need to map them to the SERIAL they got.
Or else, you can't use the multi-row INSERT and must just do many INSERTs.
--
Say NO to spam and viruses. Stop using Microsoft Windows!
On 10/21/2012 12:36 PM, Andres Freund wrote:
On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
Personally I find that a not very practical suggestion. It means you need the
ability to sort the data equivalently on the clientside which isn't always
easy if you consider platform/locale and whatever differences.
Er, what?
with orig_inserts as
(
insert into table_1
...
returning *
),
ordered_inserts as
(
select * from orig_inserts
order by ...
)
insert into table_2
select * from ordered_inserts ...;
why does the client have to be involved, exactly?
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
Note: "INSERT ... RETURNING" doesn't accept an ORDER BY clause.
No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
This is all a lot more dangerous than it looks, though. Whether or not
you believe that a VALUES clause is guaranteed to return its rows in
source order (which in practice it probably is), any such guarantee must
vanish the moment those rows undergo any further processing. For
instance if you join the VALUES with anything else, we are absolutely
not going to promise a thing about the ordering of the join result.
So the question here boils down to whether INSERT...RETURNING represents
sufficient "further processing" to void that guarantee.
In general I've got big reservations about promising anything about
the ordering of DML operations. We have had serious discussions for
instance about trying to do large UPDATE/DELETE operations in ctid
order to reduce buffer thrashing. That argument doesn't apply so much
to INSERTs --- but if the insert is affected by say a rule, it's not
obvious that there might not be good performance reasons for sticking
a sort step in there somewhere.
So, while we could maybe promise something for the *exact* case of
INSERT INTO foo VALUES ... RETURNING, I think it'd be bad policy.
The main practical effect would probably be to encourage people to
make assumptions about related but not in fact guaranteed behaviors.
IMO it'd be far better to maintain the public posture that "row order
is never guaranteed without an ORDER BY", because (a) that rule is
simple enough that people can actually remember it, and (b) it's not
going to constrain future optimization efforts.
(BTW, one reason I find the proposed regression test laughable is that
it's only testing the behavior for a small number of rows. If we ever
did want to mess with the output order of VALUES, it'd likely be because
somebody had found a way to make it a bit faster for many thousands
of rows, by sticking them into a hash table or some such. There is
basically no case where the planner's behavior for a trivial number of
rows is a reliable guide to what it will do for larger problems,
anyway.)
regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes:
Er, what?
with orig_inserts as
(
insert into table_1
...
returning *
),
ordered_inserts as
(
select * from orig_inserts
order by ...
)
insert into table_2
select * from ordered_inserts ...;
I'm not exactly following what that proves? It seems like this is still
making a not-guaranteed assumption, which is that the outer INSERT isn't
going to choose to rearrange the order of the rows coming from the CTE.
Strictly speaking, even "SELECT * FROM ordered_inserts" isn't promising
anything about row order.
regards, tom lane
On Sunday, October 21, 2012 07:24:52 PM Andrew Dunstan wrote:
On 10/21/2012 12:36 PM, Andres Freund wrote:
On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
At 2012-10-21 11:49:26 -0400, cbbrowne@gmail.com wrote:
If there is a natural sequence (e.g. - a value assigned by nextval()),
that offers a natural place to apply the usual order-imposing ORDER BY
that we are expected to use elsewhere.Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
Personally I find that a not very practical suggestion. It means you need
the ability to sort the data equivalently on the clientside which isn't
always easy if you consider platform/locale and whatever differences.Er, what?
with orig_inserts as
(
insert into table_1
...
returning *
),
ordered_inserts as
(
select * from orig_inserts
order by ...
)
insert into table_2
select * from ordered_inserts ...;
I am not sure I get the point of this.
why does the client have to be involved, exactly?
Suppose you have something like
CREATE TABLE positionlog(
id serial primary key,
timestamp timestamptz DEFAULT NOW(),
position geometry
);
And you want to insert multiple values in one roundtrip *and* know their ids
in your application.
INSERT INTO positionlog(position)
VALUES
('POINT(..., ...)'),
('POINT(..., ...)')
RETURNING id, timestamp, position
;
If you want to correlate re returned ids with data in your application without
relying on the ordering of INSERT ... VALUES... RETURNING you would need to
sort a postgis type in the same way the server does it.
Am I missing something here?
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 10/21/2012 01:39 PM, Tom Lane wrote:
I'm not exactly following what that proves? It seems like this is still
making a not-guaranteed assumption, which is that the outer INSERT isn't
going to choose to rearrange the order of the rows coming from the CTE.
Strictly speaking, even "SELECT * FROM ordered_inserts" isn't promising
anything about row order.
Hmm. If we do
INSERT INTO foo
SELECT ... ORDER BY
is that not guaranteed to insert in the desired order? We used to
suggest that in the old CLUSTER docs. (I realize that's not what I
suggested, but it seems relevant nevertheless.)
cheers
andrew