Clarify the ordering guarantees in combining queries (or lack thereof)
Greetings.
I was trying to understand what - if any - are the guarantees with regards
to ordering for combining queries (UNION/UNION ALL/...). From this
message[1]/messages/by-id/26825.1064858799@sss.pgh.pa.us, it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).
The documentation[2]https://www.postgresql.org/docs/current/queries-union.html makes no mention of this, I'd suggest adding a note
clarifying this.
Thanks,
Shay
[1]: /messages/by-id/26825.1064858799@sss.pgh.pa.us
[2]: https://www.postgresql.org/docs/current/queries-union.html
On Wed, Jul 13, 2022 at 5:08 PM Shay Rojansky <roji@roji.org> wrote:
Greetings.
I was trying to understand what - if any - are the guarantees with regards
to ordering for combining queries (UNION/UNION ALL/...). From this
message[1], it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).The documentation[2] makes no mention of this, I'd suggest adding a note
clarifying this.
Since the documentation doesn't make a guarantee there is none. If you
want ordered output use ORDER BY.
David J.
I was trying to understand what - if any - are the guarantees with
regards to ordering for combining queries (UNION/UNION ALL/...). From this
message[1]/messages/by-id/26825.1064858799@sss.pgh.pa.us, it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).
The documentation[2] makes no mention of this, I'd suggest adding a note
clarifying this.
If you want ordered output use ORDER BY.
I don't see how that could be done. Consider the following:
(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);
If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.
[1]: /messages/by-id/26825.1064858799@sss.pgh.pa.us
[2]: https://www.postgresql.org/docs/current/queries-union.html
On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:
I was trying to understand what - if any - are the guarantees with
regards to ordering for combining queries (UNION/UNION ALL/...). From this
message[1], it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).The documentation[2] makes no mention of this, I'd suggest adding a
note clarifying this.
If you want ordered output use ORDER BY.
I don't see how that could be done. Consider the following:
(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes or
partitioning or a parallel plan, etc.
In any case, there is no guarantee that the behaviour will not change in
the future due to planner improvements.
Best regards
Pantelis Theodosiou
Pantelis Theodosiou <ypercube@gmail.com> writes:
On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:
I was trying to understand what - if any - are the guarantees with
regards to ordering for combining queries (UNION/UNION ALL/...).
No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes or
partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms
It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.
If you want ordered output use ORDER BY.
I don't see how that could be done. Consider the following:
(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);
You do it like this:
=# explain analyze (select * from foo union all select * from foo) order by unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)
The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.
regards, tom lane
On Thursday, July 14, 2022, Shay Rojansky <roji@roji.org> wrote:
If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.
Yes, an order by outside the union will sort the union results as a whole.
You can still write an order by and the union all so you get any
conceivable ordering, though it may possibly require putting the union into
a subquery depending on the order and output column combination desired.
David J.
No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes
or
partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
...
Since the documentation doesn't make a guarantee there is none.
Thanks all for the confirmation.
I'd still suggest documenting the lack of guarantee; yes, mathematically it
may be correct to not document lack of guarantees, but users can come with
various expectations and misunderstandings (I also wasn't clear on this
specifically for UNION ALL), and it's always good to say this kind of thing
explicitly.
The manual still seems to offer just such a guarantee here:
https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
Multiple UNION operators in the same SELECT statement are evaluated left
to right, unless otherwise indicated by parentheses.
In the case of UNION ALL, is this supposed to mean ...
a.) Individual legs are evaluated left to right, but sets returned from
each are not guaranteed to be appended in the same order, nor is the order
within each set guaranteed to be preserved.
b.) Individual legs are evaluated left to right, sets returned from each
are appended in order, but the order within each set is not guaranteed to
be preserved.
c.) Individual legs are evaluated left to right, sets returned from each
are appended in order, and the order within each set is guaranteed to be
preserved.
d.) The manual is outdated. Since the advent of "Parallel Append" in
Postgres 11, left to right evaluation is not guaranteed in all cases.
Obviously, the order *within* each leg is not guaranteed without ORDER BY
attached to it, enclosed in parentheses. But that's an orthogonal issue.
The question is, what of the returned order is preserved after UNION ALL?
And what is *guaranteed*?
I guess the term "evaluated" is ambiguous.
I would love the manual to be clear about this.
Related discussion here:
https://dba.stackexchange.com/questions/316818/are-results-from-union-all-clauses-always-appended-in-order
Regards
Erwin
On Mon, Jun 10, 2024 at 3:03 PM Shay Rojansky <roji@roji.org> wrote:
Show quoted text
No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with differentindexes or
partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
...
Since the documentation doesn't make a guarantee there is none.
Thanks all for the confirmation.
I'd still suggest documenting the lack of guarantee; yes, mathematically
it may be correct to not document lack of guarantees, but users can come
with various expectations and misunderstandings (I also wasn't clear on
this specifically for UNION ALL), and it's always good to say this kind of
thing explicitly.