Clarify the ordering guarantees in combining queries (or lack thereof)

Started by Shay Rojanskyalmost 4 years ago8 messagesdocs
Jump to latest
#1Shay Rojansky
roji@roji.org

Greetings.

I was trying to understand what - if any - are the guarantees with regards
to ordering for combining queries (UNION/UNION ALL/...). From this
message[1]/messages/by-id/26825.1064858799@sss.pgh.pa.us, it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2]https://www.postgresql.org/docs/current/queries-union.html makes no mention of this, I'd suggest adding a note
clarifying this.

Thanks,

Shay

[1]: /messages/by-id/26825.1064858799@sss.pgh.pa.us
[2]: https://www.postgresql.org/docs/current/queries-union.html

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#1)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

On Wed, Jul 13, 2022 at 5:08 PM Shay Rojansky <roji@roji.org> wrote:

Greetings.

I was trying to understand what - if any - are the guarantees with regards
to ordering for combining queries (UNION/UNION ALL/...). From this
message[1], it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a note
clarifying this.

Since the documentation doesn't make a guarantee there is none. If you
want ordered output use ORDER BY.

David J.

#3Shay Rojansky
roji@roji.org
In reply to: David G. Johnston (#2)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

I was trying to understand what - if any - are the guarantees with

regards to ordering for combining queries (UNION/UNION ALL/...). From this
message[1]/messages/by-id/26825.1064858799@sss.pgh.pa.us, it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a note

clarifying this.

If you want ordered output use ORDER BY.

I don't see how that could be done. Consider the following:

(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.

[1]: /messages/by-id/26825.1064858799@sss.pgh.pa.us
[2]: https://www.postgresql.org/docs/current/queries-union.html

#4Pantelis Theodosiou
ypercube@gmail.com
In reply to: Shay Rojansky (#3)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:

I was trying to understand what - if any - are the guarantees with

regards to ordering for combining queries (UNION/UNION ALL/...). From this
message[1], it seems that UNION ALL does preserve the ordering of the
operand queries, whereas UNION does not (presumably neither do INTERSECT,
INTERSECT ALL, EXCEPT and EXCEPT ALL).

The documentation[2] makes no mention of this, I'd suggest adding a

note clarifying this.

If you want ordered output use ORDER BY.

I don't see how that could be done. Consider the following:

(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.

No, there is no guarantee. It's just that UNION ALL works this way today

(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes or
partitioning or a parallel plan, etc.
In any case, there is no guarantee that the behaviour will not change in
the future due to planner improvements.

Best regards
Pantelis Theodosiou

Show quoted text

[1] /messages/by-id/26825.1064858799@sss.pgh.pa.us
[2] https://www.postgresql.org/docs/current/queries-union.html

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pantelis Theodosiou (#4)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

Pantelis Theodosiou <ypercube@gmail.com> writes:

On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:

I was trying to understand what - if any - are the guarantees with
regards to ordering for combining queries (UNION/UNION ALL/...).

No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes or
partitioning or a parallel plan, etc.

Yeah, that. You can get a parallelized plan today for UNION ALL:

=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms

It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.

If you want ordered output use ORDER BY.

I don't see how that could be done. Consider the following:
(SELECT id FROM data ORDER BY id)
UNION ALL
(SELECT id FROM data ORDER BY id DESC);

You do it like this:

=# explain analyze (select * from foo union all select * from foo) order by unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)

The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.

regards, tom lane

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#3)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

On Thursday, July 14, 2022, Shay Rojansky <roji@roji.org> wrote:

If there's a guarantee that UNION ALL preserves ordering - as Tom seems to
indicate in the thread quoted above - then the above works. If there's no
such guarantee, then AFAIK the above can't be rewritten; putting the ORDER
BY outside - on the results of the UNION ALL - would order all results
rather than preserving each resultset's ordering.

Yes, an order by outside the union will sort the union results as a whole.
You can still write an order by and the union all so you get any
conceivable ordering, though it may possibly require putting the union into
a subquery depending on the order and output column combination desired.

David J.

#7Shay Rojansky
roji@roji.org
In reply to: Tom Lane (#5)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different indexes

or

partitioning or a parallel plan, etc.

Yeah, that. You can get a parallelized plan today for UNION ALL:

...

Since the documentation doesn't make a guarantee there is none.

Thanks all for the confirmation.

I'd still suggest documenting the lack of guarantee; yes, mathematically it
may be correct to not document lack of guarantees, but users can come with
various expectations and misunderstandings (I also wasn't clear on this
specifically for UNION ALL), and it's always good to say this kind of thing
explicitly.

#8Erwin Brandstetter
brsaweda@gmail.com
In reply to: Shay Rojansky (#7)
Re: Clarify the ordering guarantees in combining queries (or lack thereof)

The manual still seems to offer just such a guarantee here:
https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION

Multiple UNION operators in the same SELECT statement are evaluated left

to right, unless otherwise indicated by parentheses.

In the case of UNION ALL, is this supposed to mean ...

a.) Individual legs are evaluated left to right, but sets returned from
each are not guaranteed to be appended in the same order, nor is the order
within each set guaranteed to be preserved.

b.) Individual legs are evaluated left to right, sets returned from each
are appended in order, but the order within each set is not guaranteed to
be preserved.

c.) Individual legs are evaluated left to right, sets returned from each
are appended in order, and the order within each set is guaranteed to be
preserved.

d.) The manual is outdated. Since the advent of "Parallel Append" in
Postgres 11, left to right evaluation is not guaranteed in all cases.

Obviously, the order *within* each leg is not guaranteed without ORDER BY
attached to it, enclosed in parentheses. But that's an orthogonal issue.
The question is, what of the returned order is preserved after UNION ALL?
And what is *guaranteed*?

I guess the term "evaluated" is ambiguous.

I would love the manual to be clear about this.

Related discussion here:
https://dba.stackexchange.com/questions/316818/are-results-from-union-all-clauses-always-appended-in-order

Regards
Erwin

On Mon, Jun 10, 2024 at 3:03 PM Shay Rojansky <roji@roji.org> wrote:

Show quoted text

No, there is no guarantee. It's just that UNION ALL works this way today
(preserving the order of the subselects) - and I'm not even sure about
that, it may not preserve the order in all cases, with different

indexes or

partitioning or a parallel plan, etc.

Yeah, that. You can get a parallelized plan today for UNION ALL:

...

Since the documentation doesn't make a guarantee there is none.

Thanks all for the confirmation.

I'd still suggest documenting the lack of guarantee; yes, mathematically
it may be correct to not document lack of guarantees, but users can come
with various expectations and misunderstandings (I also wasn't clear on
this specifically for UNION ALL), and it's always good to say this kind of
thing explicitly.