Can any_value be used like first_value in an aggregate?

Started by Bruno Wolff IIIalmost 2 years ago5 messagesgeneral

Bruno Wolff III

bruno@wolff.to

almost 2 years ago

For example, is output of 10 guaranteed in the following:
bruno=> select any_value(x order by x desc) from generate_series(1,10) as x;
any_value
-----------
10
(1 row)

The use case is that I want to return a value of one column that is paired
with the maximum value of another column in each group when using GROUP BY.

(There aren't going to be any NULLs in the involved columns.)

Thanks.

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Bruno Wolff III (#1)

Re: Can any_value be used like first_value in an aggregate?

Bruno Wolff III <bruno@wolff.to> writes:

For example, is output of 10 guaranteed in the following:
bruno=> select any_value(x order by x desc) from generate_series(1,10) as x;
any_value
-----------
10
(1 row)

Not really. It will work that way in simple cases, but I think the
behavior stops being predictable if the input gets large enough to
induce the planner to use parallel aggregation. In any case, the
example shown isn't amazingly efficient since it'll still perform
a sort to meet the ORDER BY spec.

The use case is that I want to return a value of one column that is paired
with the maximum value of another column in each group when using GROUP BY.

Use window functions (i.e. first_value). This is what they're for,
and they are smart enough to do just one sort for functions sharing
a common window spec.

regards, tom lane

Bruno Wolff III

bruno@wolff.to

almost 2 years ago

In reply to: Tom Lane (#2)

Re: Can any_value be used like first_value in an aggregate?

On Tue, Jun 25, 2024 at 13:08:45 -0400,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not really. It will work that way in simple cases, but I think the
behavior stops being predictable if the input gets large enough to
induce the planner to use parallel aggregation. In any case, the
example shown isn't amazingly efficient since it'll still perform
a sort to meet the ORDER BY spec.

Thanks.

The use case is that I want to return a value of one column that is paired
with the maximum value of another column in each group when using GROUP BY.

Use window functions (i.e. first_value). This is what they're for,
and they are smart enough to do just one sort for functions sharing
a common window spec.

If I do that, I'd need to do it as a subselect inside of a group by. I'm
thinking distinct on may work and be a better way to do it. The actual
use case is a set of tripplets returned from a query, where I want on
row for each distinct value in the first column, paired with the value
in the second column, for which the third column is the largest. The
second and third columns are effectively dependent on each other, so there
won't be any ambiguity.

Thanks for getting me thinking about some other ways to approach the problem.

laurenz.albe@cybertec.at

almost 2 years ago

In reply to: Bruno Wolff III (#3)

Re: Can any_value be used like first_value in an aggregate?

On Tue, 2024-06-25 at 14:11 -0500, Bruno Wolff III wrote:

The actual
use case is a set of tripplets returned from a query, where I want on
row for each distinct value in the first column, paired with the value
in the second column, for which the third column is the largest. The
second and third columns are effectively dependent on each other, so there
won't be any ambiguity.

Try

SELECT DISTINCT ON (first_column)
first_column,
second_column,
third_column
FROM the_table
ORDER BY first_column, third_column DESC;

Yours,
Laurenz Albe

Bruno Wolff III

bruno@wolff.to

almost 2 years ago

In reply to: Laurenz Albe (#4)

Re: Can any_value be used like first_value in an aggregate?

On Wed, Jun 26, 2024 at 09:50:59 +0200,
Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2024-06-25 at 14:11 -0500, Bruno Wolff III wrote:

The actual
use case is a set of tripplets returned from a query, where I want on
row for each distinct value in the first column, paired with the value
in the second column, for which the third column is the largest. The
second and third columns are effectively dependent on each other, so there
won't be any ambiguity.

Try

SELECT DISTINCT ON (first_column)
first_column,
second_column,
third_column
FROM the_table
ORDER BY first_column, third_column DESC;

That approach worked for that part of the query. Thanks.