Document aggregate functions better w.r.t. ORDER BY

Started by David G. Johnstonover 3 years ago17 messageshackers

david.g.johnston@gmail.com

over 3 years ago

All,

The recent discussion surrounding aggregates and ORDER BY moved me to look
over our existing documentation, especially now that we've reworked the
function tables, to see what improvements can be had by simply documenting
those functions where ORDER BY may change the user-visible output. I
skipped range aggregates for the moment but handled the others on the
aggregates page (not window functions). This includes the float types for
sum and avg.

I added a note just before the table linking back to the syntax chapter and
describing the newly added rules and syntax choice in the table.

The nuances of floating point math suggest to me that specifying order by
for those is in some kind of gray area and so I've marked it optional...any
suggestions for wording (or an xref) to explain those nuances or should it
just be shown non-optional like the others? Or not shown at all?

The novelty of my examples is up for bikeshedding. I didn't want
anything too long so a subquery didn't make sense, and I was trying to
avoid duplication as well as multiple lines - hence creating a CTE that can
be copied onto all of the example queries to produce the noted result.

I added a DISTINCT example to array_agg because it is the first aggregate
on the page and so hopefully will be seen during a cursory reading. Plus,
array_agg is the go-to function for doing this kind of experimentation.

David J.

The patch is attached. A screenshot exemplifying the changes is copied
inline and attached.

[image: image.png]

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: David G. Johnston (#1)

Re: Document aggregate functions better w.r.t. ORDER BY

On Tue, Dec 13, 2022 at 07:38:15PM -0700, David G. Johnston wrote:

All,

The recent discussion surrounding aggregates and ORDER BY moved me to look over
our existing documentation, especially now that we've reworked the function
tables, to see what improvements can be had by simply documenting those
functions where ORDER BY may change the user-visible output. I skipped range
aggregates for the moment but handled the others on the aggregates page (not
window functions). This includes the float types for sum and avg.

I added a note just before the table linking back to the syntax chapter and
describing the newly added rules and syntax choice in the table.

The nuances of floating point math suggest to me that specifying order by for
those is in some kind of gray area and so I've marked it optional...any
suggestions for wording (or an xref) to explain those nuances or should it just
be shown non-optional like the others? Or not shown at all?

The novelty of my examples is up for bikeshedding. I didn't want anything too
long so a subquery didn't make sense, and I was trying to avoid duplication as
well as multiple lines - hence creating a CTE that can be copied onto all of
the example queries to produce the noted result.

I added a DISTINCT example to array_agg because it is the first aggregate on
the page and so hopefully will be seen during a cursory reading. Plus,
array_agg is the go-to function for doing this kind of experimentation.

I like this idea, though the examples seemed too detailed so I skipped
them. Here is the trimmed-down patch I would like to apply.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

David G. Johnston

david.g.johnston@gmail.com

almost 3 years ago

In reply to: Bruce Momjian (#2)

Re: Document aggregate functions better w.r.t. ORDER BY

On Tue, Oct 24, 2023 at 1:39 PM Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Dec 13, 2022 at 07:38:15PM -0700, David G. Johnston wrote:

All,

The recent discussion surrounding aggregates and ORDER BY moved me to

look over

our existing documentation, especially now that we've reworked the

function

tables, to see what improvements can be had by simply documenting those
functions where ORDER BY may change the user-visible output. I skipped

range

aggregates for the moment but handled the others on the aggregates page

(not

window functions). This includes the float types for sum and avg.

I added a note just before the table linking back to the syntax chapter

and

describing the newly added rules and syntax choice in the table.

The nuances of floating point math suggest to me that specifying order

by for

those is in some kind of gray area and so I've marked it optional...any
suggestions for wording (or an xref) to explain those nuances or should

it just

be shown non-optional like the others? Or not shown at all?

The novelty of my examples is up for bikeshedding. I didn't want

anything too

long so a subquery didn't make sense, and I was trying to avoid

duplication as

well as multiple lines - hence creating a CTE that can be copied onto

all of

the example queries to produce the noted result.

I added a DISTINCT example to array_agg because it is the first

aggregate on

the page and so hopefully will be seen during a cursory reading. Plus,
array_agg is the go-to function for doing this kind of experimentation.

I like this idea, though the examples seemed too detailed so I skipped
them. Here is the trimmed-down patch I would like to apply.

I'd prefer to keep pointing out that the ones documented are those whose
outputs will vary due to ordering.

I've been sympathetic to the user comments that we don't have enough
examples. Just using array_agg for that purpose, showing both DISTINCT and
ORDER BY seems like a fair compromise (removes two from my original
proposal). The examples in the section we tell them to go see aren't of
that great quality. If you strongly dislike having the function table
contain the examples we should at least improve the page we are sending
them to. (As an aside to this, I've personally always found the syntax
block with the 5 syntaxes shown there to be intimidating/hard-to-read).

I'd at least suggest you reconsider the commentary and examples surrounding
jsonb_object_agg.

The same goes for the special knowledge of floating point behavior for why
we've chosen to document avg/sum, something that typically doesn't care
about order, as having an optional order by.

David J.

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: David G. Johnston (#3)

Re: Document aggregate functions better w.r.t. ORDER BY

On Tue, Oct 24, 2023 at 06:45:48PM -0700, David G. Johnston wrote:

I'd prefer to keep pointing out that the ones documented are those whose
outputs will vary due to ordering.

Okay, I re-added it in the attached patch, and tightened up the text.

I've been sympathetic to the user comments that we don't have enough examples.

Good point.

Just using array_agg for that purpose, showing both DISTINCT and ORDER BY seems
like a fair compromise (removes two from my original proposal). The examples
in the section we tell them to go see aren't of that great quality. If you
strongly dislike having the function table contain the examples we should at
least improve the page we are sending them to. (As an aside to this, I've
personally always found the syntax block with the 5 syntaxes shown there to be
intimidating/hard-to-read).

I think you are right that it belongs in the syntax section; we cover
ordering extensively there. We already have queries there, but not
output, so I moved the relevant examples to there and replaced the
example that had no output.

I'd at least suggest you reconsider the commentary and examples surrounding
jsonb_object_agg.

I moved that as well, and tightened the example.

The same goes for the special knowledge of floating point behavior for why
we've chosen to document avg/sum, something that typically doesn't care about
order, as having an optional order by.

The floating example seems too obscure to mention in our function docs.
I can put a sentence in the syntax docs, but is there value in
explaining that to users? How it that helpful? Example?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#4)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 8:36 AM Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Oct 24, 2023 at 06:45:48PM -0700, David G. Johnston wrote:

I'd prefer to keep pointing out that the ones documented are those whose
outputs will vary due to ordering.

Okay, I re-added it in the attached patch, and tightened up the text.

Thanks

I think you are right that it belongs in the syntax section; we cover
ordering extensively there. We already have queries there, but not
output, so I moved the relevant examples to there and replaced the
example that had no output.

Thanks

The same goes for the special knowledge of floating point behavior for

why

we've chosen to document avg/sum, something that typically doesn't care

about

order, as having an optional order by.

The floating example seems too obscure to mention in our function docs.
I can put a sentence in the syntax docs, but is there value in
explaining that to users? How it that helpful? Example?

Yeah, we punt on the entire concept in the data type section:

"Managing these errors and how they propagate through calculations is the
subject of an entire branch of mathematics and computer science and will
not be discussed here," ...

Also, I'm now led to believe that the relevant IEEE 754 floating point
addition is indeed commutative. Given that, I am inclined to simply not
add the order by clause at all to those four functions. (actually, you
already got rid of the avg()s but the sum()s are still present, so just
those two).

David J.

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#5)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 04:14:11PM -0700, David G. Johnston wrote:

Yeah, we punt on the entire concept in the data type section:

"Managing these errors and how they propagate through calculations is the
subject of an entire branch of mathematics and computer science and will not be
discussed here," ...

Also, I'm now led to believe that the relevant IEEE 754 floating point addition
is indeed commutative. Given that, I am inclined to simply not add the order
by clause at all to those four functions. (actually, you already got rid of the
avg()s but the sum()s are still present, so just those two).

Ah, yes, sum() removed. Updated patch attached.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#6)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 4:22 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 25, 2023 at 04:14:11PM -0700, David G. Johnston wrote:

Yeah, we punt on the entire concept in the data type section:

"Managing these errors and how they propagate through calculations is the
subject of an entire branch of mathematics and computer science and will

not be

discussed here," ...

Also, I'm now led to believe that the relevant IEEE 754 floating point

addition

is indeed commutative. Given that, I am inclined to simply not add the

order

by clause at all to those four functions. (actually, you already got rid

of the

avg()s but the sum()s are still present, so just those two).

Ah, yes, sum() removed. Updated patch attached.

The paragraph leading into the last added example needs to be tweaked:

If DISTINCT is specified within an aggregate, the data is sorted in
ascending order while extracting unique values. You can add an ORDER BY
clause, limited to expressions matching the regular arguments of the
aggregate, to sort the output in descending order.

(show existing - DISTINCT only - example here)

<programlisting>
WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
string_agg
-----------
4;3;2;1
</programlisting>

(existing note)

Question: Do you know whether we for certain always sort ascending here to
compute the unique values or whether if, say, there is an index on the
column in descending order (or ascending and traversed backwards) that the
data within the aggregate could, with an order by, be returned in
descending order? If it is ascending, is that part of the SQL Standard
(since it doesn't even allow an order by to give the user the ability the
control the output ordering) or does the SQL Standard expect that even a
random order would be fine since there are algorithms that can be used that
do not involve sorting the input?

It seems redundant to first say "regular arguments" then negate it in order
to say "DISTINCT list". Using the positive form with "DISTINCT list"
should get the point across sufficiently and succinctly. It also avoids me
feeling like there should be an example of what happens when you do "sort
on an expression that is not included in the DISTINCT list".

Interestingly:

WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in
argument list
LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...

But both expressions in the argument list (el and semicolon) do appear in
the ORDER BY...

David J.

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#7)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote:

The paragraph leading into the last added example needs to be tweaked:

If DISTINCT is specified within an aggregate, the data is sorted in ascending
order while extracting unique values. You can add an ORDER BY clause, limited
to expressions matching the regular arguments of the aggregate, to sort the
output in descending order.

(show existing - DISTINCT only - example here)

<programlisting>
WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
string_agg
-----------
4;3;2;1
</programlisting>

(existing note)

I see what you mean. I added an example that doesn't match the existing
paragraph. I have rewritten the paragraph and used a relevant example;
patch attached.

Question: Do you know whether we for certain always sort ascending here to
compute the unique values or whether if, say, there is an index on the column
in descending order (or ascending and traversed backwards) that the data within
the aggregate could, with an order by, be returned in descending order? If it
is ascending, is that part of the SQL Standard (since it doesn't even allow an
order by to give the user the ability the control the output ordering) or does
the SQL Standard expect that even a random order would be fine since there are
algorithms that can be used that do not involve sorting the input?

I don't think order is ever guaranteed in the standard without an ORDER
BY.

It seems redundant to first say "regular arguments" then negate it in order to
say "DISTINCT list". Using the positive form with "DISTINCT list" should get
the point across sufficiently and succinctly. It also avoids me feeling like
there should be an example of what happens when you do "sort on an expression
that is not included in the DISTINCT list".

Agreed, I rewrote that.

Interestingly:

WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in
argument list
LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...

But both expressions in the argument list (el and semicolon) do appear in the
ORDER BY...

I think ORDER BY has to match DISTINCT columns, while you are using ';'.
I used a simpler example with array_agg() in my patch to avoid the issue.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: David G. Johnston (#7)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, 26 Oct 2023 at 13:10, David G. Johnston
<david.g.johnston@gmail.com> wrote:

Question: Do you know whether we for certain always sort ascending here to compute the unique values or whether if, say, there is an index on the column in descending order (or ascending and traversed backwards) that the data within the aggregate could, with an order by, be returned in descending order?

The way it's currently coded, we seem to always require ascending
order. See addTargetToGroupList(). The call to
get_sort_group_operators() only requests the ltOpr.

A quick test creating an index on a column with DESC shows that we end
up doing a backwards index scan so that we get the requested ascending
order:

create table b (b text);
create index on b (b desc);
explain select string_agg(distinct b,',') from b;
QUERY PLAN
------------------------------------------------------------------------------------------
Aggregate (cost=67.95..67.97 rows=1 width=32)
-> Index Only Scan Backward using b_b_idx on b (cost=0.15..64.55
rows=1360 width=32)
(2 rows)

However, I think we'd best stay clear of offering any guarantees in
the documents about this. If we did that it would be much harder in
the future if we wanted to implement the DISTINCT aggregates by
hashing.

David

#10

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: David Rowley (#9)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 26 Oct 2023 at 13:10, David G. Johnston
<david.g.johnston@gmail.com> wrote:

Question: Do you know whether we for certain always sort ascending here

to compute the unique values or whether if, say, there is an index on the
column in descending order (or ascending and traversed backwards) that the
data within the aggregate could, with an order by, be returned in
descending order?

The way it's currently coded, we seem to always require ascending
order. See addTargetToGroupList(). The call to
get_sort_group_operators() only requests the ltOpr.

A quick test creating an index on a column with DESC shows that we end
up doing a backwards index scan so that we get the requested ascending
order:

create table b (b text);
create index on b (b desc);
explain select string_agg(distinct b,',') from b;
QUERY PLAN

------------------------------------------------------------------------------------------
Aggregate (cost=67.95..67.97 rows=1 width=32)
-> Index Only Scan Backward using b_b_idx on b (cost=0.15..64.55
rows=1360 width=32)
(2 rows)

However, I think we'd best stay clear of offering any guarantees in
the documents about this. If we did that it would be much harder in
the future if we wanted to implement the DISTINCT aggregates by
hashing.

So, I think we are mischaracterizing the Standard here, if only in the

specific case of array_agg.

SQL Standard: 4.16.4

Every unary aggregate function takes an arbitrary <value expression> as the
argument; most unary aggregate
functions can optionally be qualified with either DISTINCT or ALL.

If ARRAY_AGG is specified, then an array value with one element formed from
the <value expression>
evaluated for each row that qualifies.

Neither DISTINCT nor ALL are allowed to be specified for VAR_POP, VAR_SAMP,
STDDEV_POP, or
STDDEV_SAMP; redundant duplicates are not removed when computing these
functions.

10.9

<array aggregate function> ::=
ARRAY_AGG
<left paren> <value expression> [ ORDER BY <sort specification list> ]
<right paren>

I would reword the existing note to be something like:

The SQL Standard defines specific aggregates and their properties,
including which of DISTINCT and/or ORDER BY is allowed. Due to the
extensible nature of PostgreSQL it accepts either or both clauses for any
aggregate.

From the most recent patch:

    <para>
-    If <literal>DISTINCT</literal> is specified in addition to an
-    <replaceable>order_by_clause</replaceable>, then all the
<literal>ORDER BY</literal>
-    expressions must match regular arguments of the aggregate; that is,
-    you cannot sort on an expression that is not included in the
-    <literal>DISTINCT</literal> list.
+    If <literal>DISTINCT</literal> is specified with an
+    <replaceable>order_by_clause</replaceable>, <literal>ORDER
+    BY</literal> expressions can only reference columns in the
+    <literal>DISTINCT</literal> list.  For example:
+<programlisting>
+WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
+SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
+  array_agg
+-------------
+ {Z,T,R,D,A}
+</programlisting>

The change to a two-column vals was mostly to try and find corner-cases
that might need to be addressed. If we don't intend to show the error case
of DISTINCT v1 ORDER BY v2 then we should go back to the original example
and just add ORDER BY v DESC. I'm fine with not using string_agg here.

+    For example:
+<programlisting>
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
+ array_agg
+-----------
+ {4,3,2,1}
+</programlisting>

We get enough complaints regarding "apparent ordering" that I would like to
add:

As a reminder, while some DISTINCT processing algorithms produce sorted
output as a side-effect, only by specifying ORDER BY is the output order
guaranteed.

David J.

#11

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#10)

Re: Document aggregate functions better w.r.t. ORDER BY

On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:

I would reword the existing note to be something like:

The SQL Standard defines specific aggregates and their properties, including
which of DISTINCT and/or ORDER BY is allowed. Due to the extensible nature of
PostgreSQL it accepts either or both clauses for any aggregate.

Uh, is this something in my patch or somewhere else? I don't think
PostgreSQL extensible is an example of syntax flexibility.

From the most recent patch:

    <para>
-    If <literal>DISTINCT</literal> is specified in addition to an
-    <replaceable>order_by_clause</replaceable>, then all the <literal>ORDER BY
</literal>
-    expressions must match regular arguments of the aggregate; that is,
-    you cannot sort on an expression that is not included in the
-    <literal>DISTINCT</literal> list.
+    If <literal>DISTINCT</literal> is specified with an
+    <replaceable>order_by_clause</replaceable>, <literal>ORDER
+    BY</literal> expressions can only reference columns in the
+    <literal>DISTINCT</literal> list.  For example:
+<programlisting>
+WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
+SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
+  array_agg
+-------------
+ {Z,T,R,D,A}
+</programlisting>

The change to a two-column vals was mostly to try and find corner-cases that
might need to be addressed. If we don't intend to show the error case of
DISTINCT v1 ORDER BY v2 then we should go back to the original example and just
add ORDER BY v DESC. I'm fine with not using string_agg here.

+    For example:
+<programlisting>
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
+ array_agg
+-----------
+ {4,3,2,1}
+</programlisting>

Okay, good, switched in the attached patch.

We get enough complaints regarding "apparent ordering" that I would like to
add:

As a reminder, while some DISTINCT processing algorithms produce sorted output
as a side-effect, only by specifying ORDER BY is the output order guaranteed.

Well, we need to create a new email thread for this and look at all the
areas is applies to since this is a much larger issue.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#12

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#11)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 2:56 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:

I would reword the existing note to be something like:

The SQL Standard defines specific aggregates and their properties,

including

which of DISTINCT and/or ORDER BY is allowed. Due to the extensible

nature of

PostgreSQL it accepts either or both clauses for any aggregate.

Uh, is this something in my patch or somewhere else? I don't think
PostgreSQL extensible is an example of syntax flexibility.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Note
The ability to specify both DISTINCT and ORDER BY in an aggregate function
is a PostgreSQL extension.

I am pointing out that the first sentence of the existing note above seems
to be factually incorrect. I tried to make it correct - while explaining
why we differ. Though in truth I'd probably rather just remove the note.

We get enough complaints regarding "apparent ordering" that I would like
to

add:

As a reminder, while some DISTINCT processing algorithms produce sorted

output

as a side-effect, only by specifying ORDER BY is the output order

guaranteed.

Well, we need to create a new email thread for this and look at all the
areas is applies to since this is a much larger issue.

I was hoping to sneak this one in regardless of the bigger picture issues,
since this specific combination is guaranteed to output ordered presently.

David J.

#13

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#12)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 03:09:26PM -0700, David G. Johnston wrote:

On Thu, Oct 26, 2023 at 2:56 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:

I would reword the existing note to be something like:

The SQL Standard defines specific aggregates and their properties,

including

which of DISTINCT and/or ORDER BY is allowed. Due to the extensible

nature of

PostgreSQL it accepts either or both clauses for any aggregate.

Uh, is this something in my patch or somewhere else? I don't think
PostgreSQL extensible is an example of syntax flexibility.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Note
The ability to specify both DISTINCT and ORDER BY in an aggregate function is a
PostgreSQL extension.

I am pointing out that the first sentence of the existing note above seems to
be factually incorrect. I tried to make it correct - while explaining why we
differ. Though in truth I'd probably rather just remove the note.

Agreed, removed, patch attached. This is just too complex to specify.

We get enough complaints regarding "apparent ordering" that I would like

to

add:

As a reminder, while some DISTINCT processing algorithms produce sorted

output

as a side-effect, only by specifying ORDER BY is the output order

guaranteed.

Well, we need to create a new email thread for this and look at all the
areas is applies to since this is a much larger issue.

I was hoping to sneak this one in regardless of the bigger picture issues,
since this specific combination is guaranteed to output ordered presently.

No sneaking. ;-) It would be bad to document this unevenly because it
sets expectations in other parts of the system if we don't mention it.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#14

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#13)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 3:36 PM Bruce Momjian <bruce@momjian.us> wrote:

No sneaking. ;-) It would be bad to document this unevenly because it
sets expectations in other parts of the system if we don't mention it.

Agreed.

Last suggestion, remove the first jsonb_agg example that lacks an order by.

+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM vals;
+      jsonb_object_agg
+----------------------------
+ {"key0": "1", "key1": "2"}
+

We shouldn't write an example that relies on the rows being evaluated 1-2-3
without specifying an order by clause.

David J.

#15

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#14)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 03:44:14PM -0700, David G. Johnston wrote:

On Thu, Oct 26, 2023 at 3:36 PM Bruce Momjian <bruce@momjian.us> wrote:

No sneaking. ;-) It would be bad to document this unevenly because it
sets expectations in other parts of the system if we don't mention it.

Agreed.

Last suggestion, remove the first jsonb_agg example that lacks an order by.
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM vals;
+      jsonb_object_agg
+----------------------------
+ {"key0": "1", "key1": "2"}
+
We shouldn't write an example that relies on the rows being evaluated 1-2-3
without specifying an order by clause.

Sure, done in the attached patch.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#16

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#15)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 4:03 PM Bruce Momjian <bruce@momjian.us> wrote:

Sure, done in the attached patch.

WFM. Thank You!

David J.

#17

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: David G. Johnston (#16)

Re: Document aggregate functions better w.r.t. ORDER BY

On Thu, Oct 26, 2023 at 04:05:12PM -0700, David G. Johnston wrote:

On Thu, Oct 26, 2023 at 4:03 PM Bruce Momjian <bruce@momjian.us> wrote:

Sure, done in the attached patch.

WFM. Thank You!

Patch applied to master.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Document aggregate functions better w.r.t. ORDER BY

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: