ANY_VALUE aggregate

Started by Vik Fearingabout 3 years ago29 messages
#1Vik Fearing
vik@postgresfriends.org
1 attachment(s)

The SQL:2023 Standard defines a new aggregate named ANY_VALUE. It
returns an implementation-dependent (i.e. non-deterministic) value from
the rows in its group.

PFA an implementation of this aggregate.

Ideally, the transition function would stop being called after the first
non-null was found, and then the entire aggregation would stop when all
functions say they are finished[*], but this patch does not go anywhere
near that far.

This patch is based off of commit fb958b5da8.

[*] I can imagine something like array_agg(c ORDER BY x LIMIT 5) to get
the top five of something without going through a LATERAL subquery.
--
Vik Fearing

Attachments:

0001-Implement-ANY_VALUE-aggregate.patchtext/x-patch; charset=UTF-8; name=0001-Implement-ANY_VALUE-aggregate.patchDownload
From 7465fac12fc636ff26088ae31de2937f7c3a459f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 9 Apr 2022 00:07:38 +0200
Subject: [PATCH] Implement ANY_VALUE aggregate

SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
 doc/src/sgml/func.sgml                   | 14 ++++++++++++++
 src/backend/utils/adt/misc.c             | 12 ++++++++++++
 src/include/catalog/pg_aggregate.dat     |  4 ++++
 src/include/catalog/pg_proc.dat          |  8 ++++++++
 src/test/regress/expected/aggregates.out | 18 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql      |  5 +++++
 6 files changed, 61 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2052d3c844..1823ee71d7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19706,16 +19706,30 @@ SELECT NULLIF(value, '(none)') ...
        <para>
         Description
        </para></entry>
        <entry>Partial Mode</entry>
       </row>
      </thead>
 
      <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>any_value</primary>
+        </indexterm>
+        <function>any_value</function> ( <type>"any"</type> )
+        <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+       </para>
+       <para>
+        Chooses a non-deterministic value from the non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>array_agg</primary>
         </indexterm>
         <function>array_agg</function> ( <type>anynonarray</type> )
         <returnvalue>anyarray</returnvalue>
        </para>
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 9c13251231..94c92de06d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -928,8 +928,20 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
 	idxoid = RelationGetReplicaIndex(rel);
 	table_close(rel, AccessShareLock);
 
 	if (OidIsValid(idxoid))
 		PG_RETURN_OID(idxoid);
 	else
 		PG_RETURN_NULL();
 }
+
+Datum
+any_value_trans(PG_FUNCTION_ARGS)
+{
+	/* Return the first non-null argument */
+	if (!PG_ARGISNULL(0))
+		PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+	if (!PG_ARGISNULL(1))
+		PG_RETURN_DATUM(PG_GETARG_DATUM(1));
+	PG_RETURN_NULL();
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index b9110a5298..37626d6f0c 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -620,9 +620,13 @@
   aggtransfn => 'ordered_set_transition_multi', aggfinalfn => 'cume_dist_final',
   aggfinalextra => 't', aggfinalmodify => 'w', aggmfinalmodify => 'w',
   aggtranstype => 'internal' },
 { aggfnoid => 'dense_rank(any)', aggkind => 'h', aggnumdirectargs => '1',
   aggtransfn => 'ordered_set_transition_multi',
   aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
   aggmfinalmodify => 'w', aggtranstype => 'internal' },
 
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_trans',
+  aggcombinefn => 'any_value_trans', aggtranstype => 'anyelement' },
+
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..2ee4797559 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11849,9 +11849,17 @@
   proname => 'brin_minmax_multi_summary_recv', provolatile => 's',
   prorettype => 'pg_brin_minmax_multi_summary', proargtypes => 'internal',
   prosrc => 'brin_minmax_multi_summary_recv' },
 { oid => '4641', descr => 'I/O',
   proname => 'brin_minmax_multi_summary_send', provolatile => 's',
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8981', descr => 'arbitrary value from among input values',
+  proname => 'any_value', prokind => 'a', proisstrict => 'f',
+  prorettype => 'anyelement', proargtypes => 'anyelement',
+  prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+  proname => 'any_value_trans', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+  prosrc => 'any_value_trans' },
+
 ]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index fc2bd40be2..fb87b9abf1 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -20,16 +20,28 @@ SELECT avg(four) AS avg_1 FROM onek;
 (1 row)
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
        avg_32        
 ---------------------
  32.6666666666666667
 (1 row)
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value 
+-----------
+         1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value 
+-----------
+ 
+(1 row)
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
  avg_107_943 
 -------------
      107.943
 (1 row)
 
@@ -1875,16 +1887,22 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
  max 
 -----
  a
 (1 row)
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+ any_value 
+-----------
+         3
+(1 row)
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
  count 
 -------
      1
      1
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index a4c00ff7a9..7206e475a1 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -19,16 +19,19 @@ COPY aggtest FROM :'filename';
 
 ANALYZE aggtest;
 
 
 SELECT avg(four) AS avg_1 FROM onek;
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
 
 SELECT avg(gpa) AS avg_3_4 FROM ONLY student;
 
 
@@ -711,16 +714,18 @@ group by ten;
 
 select ten, sum(distinct four) filter (where four > 10) from onek a
 group by ten
 having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
 select (select count(*) filter (where outer_c <> 0)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- outer query is aggregation query
 select (select count(inner_c) filter (where outer_c <> 0)
-- 
2.34.1

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#1)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 7:57 AM Vik Fearing <vik@postgresfriends.org> wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE. It
returns an implementation-dependent (i.e. non-deterministic) value from
the rows in its group.

PFA an implementation of this aggregate.

Can we please add "first_value" and "last_value" if we are going to add
"some_random_value" to our library of aggregates?

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value, instead
of simply returning the first value all the time. Maybe even prohibit the
first value from being chosen so long as a second value appears.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: ANY_VALUE aggregate

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Can we please add "first_value" and "last_value" if we are going to add
"some_random_value" to our library of aggregates?

First and last according to what ordering? We have those in the
window-aggregate case, and I don't think we want to encourage people
to believe that "first" and "last" are meaningful otherwise.

ANY_VALUE at least makes it clear that you're getting an unspecified
one of the inputs.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Can we please add "first_value" and "last_value" if we are going to add
"some_random_value" to our library of aggregates?

First and last according to what ordering? We have those in the
window-aggregate case, and I don't think we want to encourage people
to believe that "first" and "last" are meaningful otherwise.

ANY_VALUE at least makes it clear that you're getting an unspecified
one of the inputs.

I have personally implemented first_value() and last_value() in the
past in cases where I had guaranteed the ordering myself, or didn't
care what ordering was used. I think they're perfectly sensible. But
if we don't add them to core, at least they're easy to add in
user-space.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Corey Huinker
corey.huinker@gmail.com
In reply to: David G. Johnston (#2)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 12:57 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Mon, Dec 5, 2022 at 7:57 AM Vik Fearing <vik@postgresfriends.org>
wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE. It
returns an implementation-dependent (i.e. non-deterministic) value from
the rows in its group.

PFA an implementation of this aggregate.

Can we please add "first_value" and "last_value" if we are going to add
"some_random_value" to our library of aggregates?

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value, instead
of simply returning the first value all the time. Maybe even prohibit the
first value from being chosen so long as a second value appears.

David J.

Adding to the pile of wanted aggregates: in the past I've lobbied for
only_value() which is like first_value() but it raises an error on
encountering a second value.

#6Robert Haas
robertmhaas@gmail.com
In reply to: Corey Huinker (#5)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 2:31 PM Corey Huinker <corey.huinker@gmail.com> wrote:

Adding to the pile of wanted aggregates: in the past I've lobbied for only_value() which is like first_value() but it raises an error on encountering a second value.

Yeah, that's another that I have hand-rolled in the past.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Vik Fearing
vik@postgresfriends.org
In reply to: Vik Fearing (#1)
1 attachment(s)
Re: ANY_VALUE aggregate

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value from
the rows in its group.

PFA an implementation of this aggregate.

Here is v2 of this patch. I had forgotten to update sql_features.txt.
--
Vik Fearing

Attachments:

0001-Implement-ANY_VALUE-aggregate.v02.patchtext/x-patch; charset=UTF-8; name=0001-Implement-ANY_VALUE-aggregate.v02.patchDownload
From a9bb61aab9788ae25fdcd28f7dcfb54a263665cc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 9 Apr 2022 00:07:38 +0200
Subject: [PATCH] Implement ANY_VALUE aggregate

SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
 doc/src/sgml/func.sgml                   | 14 ++++++++++++++
 src/backend/catalog/sql_features.txt     |  1 +
 src/backend/utils/adt/misc.c             | 12 ++++++++++++
 src/include/catalog/pg_aggregate.dat     |  4 ++++
 src/include/catalog/pg_proc.dat          |  8 ++++++++
 src/test/regress/expected/aggregates.out | 18 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql      |  5 +++++
 7 files changed, 62 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2052d3c844..1823ee71d7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19706,16 +19706,30 @@ SELECT NULLIF(value, '(none)') ...
        <para>
         Description
        </para></entry>
        <entry>Partial Mode</entry>
       </row>
      </thead>
 
      <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>any_value</primary>
+        </indexterm>
+        <function>any_value</function> ( <type>"any"</type> )
+        <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+       </para>
+       <para>
+        Chooses a non-deterministic value from the non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>array_agg</primary>
         </indexterm>
         <function>array_agg</function> ( <type>anynonarray</type> )
         <returnvalue>anyarray</returnvalue>
        </para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 8704a42b60..b7b6ad6334 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -515,16 +515,17 @@ T617	FIRST_VALUE and LAST_VALUE functions			YES
 T618	NTH_VALUE function			NO	function exists, but some options missing
 T619	Nested window functions			NO	
 T620	WINDOW clause: GROUPS option			YES	
 T621	Enhanced numeric functions			YES	
 T622	Trigonometric functions			YES	
 T623	General logarithm functions			YES	
 T624	Common logarithm functions			YES	
 T625	LISTAGG			NO	
+T626	ANY_VALUE			YES	
 T631	IN predicate with one list element			YES	
 T641	Multiple column assignment			NO	only some syntax variants supported
 T651	SQL-schema statements in SQL routines			YES	
 T652	SQL-dynamic statements in SQL routines			NO	
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T811	Basic SQL/JSON constructor functions			NO	
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 9c13251231..94c92de06d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -928,8 +928,20 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
 	idxoid = RelationGetReplicaIndex(rel);
 	table_close(rel, AccessShareLock);
 
 	if (OidIsValid(idxoid))
 		PG_RETURN_OID(idxoid);
 	else
 		PG_RETURN_NULL();
 }
+
+Datum
+any_value_trans(PG_FUNCTION_ARGS)
+{
+	/* Return the first non-null argument */
+	if (!PG_ARGISNULL(0))
+		PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+	if (!PG_ARGISNULL(1))
+		PG_RETURN_DATUM(PG_GETARG_DATUM(1));
+	PG_RETURN_NULL();
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index b9110a5298..37626d6f0c 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -620,9 +620,13 @@
   aggtransfn => 'ordered_set_transition_multi', aggfinalfn => 'cume_dist_final',
   aggfinalextra => 't', aggfinalmodify => 'w', aggmfinalmodify => 'w',
   aggtranstype => 'internal' },
 { aggfnoid => 'dense_rank(any)', aggkind => 'h', aggnumdirectargs => '1',
   aggtransfn => 'ordered_set_transition_multi',
   aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
   aggmfinalmodify => 'w', aggtranstype => 'internal' },
 
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_trans',
+  aggcombinefn => 'any_value_trans', aggtranstype => 'anyelement' },
+
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..2ee4797559 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11849,9 +11849,17 @@
   proname => 'brin_minmax_multi_summary_recv', provolatile => 's',
   prorettype => 'pg_brin_minmax_multi_summary', proargtypes => 'internal',
   prosrc => 'brin_minmax_multi_summary_recv' },
 { oid => '4641', descr => 'I/O',
   proname => 'brin_minmax_multi_summary_send', provolatile => 's',
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8981', descr => 'arbitrary value from among input values',
+  proname => 'any_value', prokind => 'a', proisstrict => 'f',
+  prorettype => 'anyelement', proargtypes => 'anyelement',
+  prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+  proname => 'any_value_trans', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+  prosrc => 'any_value_trans' },
+
 ]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index fc2bd40be2..fb87b9abf1 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -20,16 +20,28 @@ SELECT avg(four) AS avg_1 FROM onek;
 (1 row)
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
        avg_32        
 ---------------------
  32.6666666666666667
 (1 row)
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value 
+-----------
+         1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value 
+-----------
+ 
+(1 row)
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
  avg_107_943 
 -------------
      107.943
 (1 row)
 
@@ -1875,16 +1887,22 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
  max 
 -----
  a
 (1 row)
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+ any_value 
+-----------
+         3
+(1 row)
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
  count 
 -------
      1
      1
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index a4c00ff7a9..7206e475a1 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -19,16 +19,19 @@ COPY aggtest FROM :'filename';
 
 ANALYZE aggtest;
 
 
 SELECT avg(four) AS avg_1 FROM onek;
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
 
 SELECT avg(gpa) AS avg_3_4 FROM ONLY student;
 
 
@@ -711,16 +714,18 @@ group by ten;
 
 select ten, sum(distinct four) filter (where four > 10) from onek a
 group by ten
 having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
 select (select count(*) filter (where outer_c <> 0)
         from (values (1)) t0(inner_c))
 from (values (2),(3)) t1(outer_c); -- outer query is aggregation query
 select (select count(inner_c) filter (where outer_c <> 0)
-- 
2.34.1

#8Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#2)
Re: ANY_VALUE aggregate

On 12/5/22 18:56, David G. Johnston wrote:

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value, instead
of simply returning the first value all the time. Maybe even prohibit the
first value from being chosen so long as a second value appears.

The spec says the result is implementation-dependent meaning we don't
even need to document how it is obtained, but surely behavior like this
would preclude future optimizations like the ones I mentioned?

I once wrote a random_agg() for a training course that used reservoir
sampling to get an evenly distributed value from the inputs. Something
like that seems to be what you are looking for here. I don't see the
use case for adding it to core, though.

The use case for ANY_VALUE is compliance with the standard.
--
Vik Fearing

#9Vik Fearing
vik@postgresfriends.org
In reply to: Corey Huinker (#5)
Re: ANY_VALUE aggregate

On 12/5/22 20:31, Corey Huinker wrote:

Adding to the pile of wanted aggregates: in the past I've lobbied for
only_value() which is like first_value() but it raises an error on
encountering a second value.

I have had use for this in the past, but I can't remember why. What is
your use case for it? I will happily write a patch for it, and also
submit it to the SQL Committee for inclusion in the standard. I need to
justify why it's a good idea, though, and we would need to consider what
to do with nulls now that there is <unique null treatment>.
--
Vik Fearing

#10Isaac Morland
isaac.morland@gmail.com
In reply to: Vik Fearing (#9)
Re: ANY_VALUE aggregate

On Mon, 5 Dec 2022 at 22:52, Vik Fearing <vik@postgresfriends.org> wrote:

On 12/5/22 20:31, Corey Huinker wrote:

Adding to the pile of wanted aggregates: in the past I've lobbied for
only_value() which is like first_value() but it raises an error on
encountering a second value.

I have had use for this in the past, but I can't remember why. What is
your use case for it? I will happily write a patch for it, and also
submit it to the SQL Committee for inclusion in the standard. I need to
justify why it's a good idea, though, and we would need to consider what
to do with nulls now that there is <unique null treatment>.

I have this in my local library of "stuff that I really wish came with
Postgres", although I call it same_agg and it just goes to NULL if there
are more than one distinct value.

I sometimes use it when normalizing non-normalized data, but more commonly
I use it when the query planner isn't capable of figuring out that a column
I want to use in the output depends only on the grouping columns. For
example, something like:

SELECT group_id, group_name, count(*) from group_group as gg natural join
group_member as gm group by group_id

I think that exact example actually does or is supposed to work now, since
it realizes that I'm grouping on the primary key of group_group so the
group_name field in the same table can't differ between rows of a group,
but most of the time when I expect that feature to allow me to use a field
it actually doesn't.

I have a vague notion that part of the issue may be the distinction between
gg.group_id, gm.group_id, and group_id; maybe the above doesn't work but it
does work if I group by gg.group_id instead of by group_id. But obviously
there should be no difference because in this query those 3 values cannot
differ (outer joins are another story).

For reference, here is my definition:

CREATE OR REPLACE FUNCTION same_sfunc (
a anyelement,
b anyelement
) RETURNS anyelement
LANGUAGE SQL IMMUTABLE STRICT
SET search_path FROM CURRENT
AS $$
SELECT CASE WHEN $1 = $2 THEN $1 ELSE NULL END
$$;
COMMENT ON FUNCTION same_sfunc (anyelement, anyelement) IS 'SFUNC for
same_agg aggregate; returns common value of parameters, or NULL if they
differ';

DROP AGGREGATE IF EXISTS same_agg (anyelement);
CREATE AGGREGATE same_agg (anyelement) (
SFUNC = same_sfunc,
STYPE = anyelement
);
COMMENT ON AGGREGATE same_agg (anyelement) IS 'Return the common non-NULL
value of all non-NULL aggregated values, or NULL if some values differ';

You can tell I've had this for a while - there are several newer Postgres
features that could be used to clean this up noticeably.

I also have a repeat_agg which returns the last value (not so interesting)
but which is sometimes useful as a window function (more interesting:
replace NULLs with the previous non-NULL value in the column).

#11David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#8)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 8:46 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/5/22 18:56, David G. Johnston wrote:

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value,

instead

of simply returning the first value all the time. Maybe even prohibit

the

first value from being chosen so long as a second value appears.

The spec says the result is implementation-dependent meaning we don't
even need to document how it is obtained, but surely behavior like this
would preclude future optimizations like the ones I mentioned?

So, given the fact that we don't actually want to name a function
first_value (because some users are readily confused as to when the concept
of first is actually valid or not) but some users do actually wish for this
functionality - and you are proposing to implement it here anyway - how
about we actually do document that we promise to return the first non-null
value encountered by the aggregate. We can then direct people to this
function and just let them know to pretend the function is really named
first_value in the case where they specify an order by. (last_value comes
for basically free with descending sorting).

I once wrote a random_agg() for a training course that used reservoir
sampling to get an evenly distributed value from the inputs. Something
like that seems to be what you are looking for here. I don't see the
use case for adding it to core, though.

The use case was basically what Tom was saying - I don't want our users
that don't understand the necessity of order by, and don't read the
documentation, to observe that we consistently return the first non-null
value and assume that this is what the function promises when we are not
making any such promise to them. As noted above, my preference at this
point would be to just make that promise.

David J.

#12Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#11)
Re: ANY_VALUE aggregate

On 12/6/22 05:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 8:46 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/5/22 18:56, David G. Johnston wrote:

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value,

instead

of simply returning the first value all the time. Maybe even prohibit

the

first value from being chosen so long as a second value appears.

The spec says the result is implementation-dependent meaning we don't
even need to document how it is obtained, but surely behavior like this
would preclude future optimizations like the ones I mentioned?

So, given the fact that we don't actually want to name a function
first_value (because some users are readily confused as to when the concept
of first is actually valid or not) but some users do actually wish for this
functionality - and you are proposing to implement it here anyway - how
about we actually do document that we promise to return the first non-null
value encountered by the aggregate. We can then direct people to this
function and just let them know to pretend the function is really named
first_value in the case where they specify an order by. (last_value comes
for basically free with descending sorting).

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate. There is no reason to
cause an extra sort when the user has requested *any value*.

I once wrote a random_agg() for a training course that used reservoir
sampling to get an evenly distributed value from the inputs. Something
like that seems to be what you are looking for here. I don't see the
use case for adding it to core, though.

The use case was basically what Tom was saying - I don't want our users
that don't understand the necessity of order by, and don't read the
documentation, to observe that we consistently return the first non-null
value and assume that this is what the function promises when we are not
making any such promise to them.

Documenting something for the benefit of those who do not read the
documentation is a ridiculous proposal.

As noted above, my preference at this point would be to just make that promise.

I see no reason to paint ourselves into a corner here.
--
Vik Fearing

#13David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#12)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/6/22 05:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 8:46 PM Vik Fearing <vik@postgresfriends.org>

wrote:

On 12/5/22 18:56, David G. Johnston wrote:

Also, maybe we should have any_value do something like compute a 50/50
chance that any new value seen replaces the existing chosen value,

instead

of simply returning the first value all the time. Maybe even prohibit

the

first value from being chosen so long as a second value appears.

The spec says the result is implementation-dependent meaning we don't
even need to document how it is obtained, but surely behavior like this
would preclude future optimizations like the ones I mentioned?

So, given the fact that we don't actually want to name a function
first_value (because some users are readily confused as to when the

concept

of first is actually valid or not) but some users do actually wish for

this

functionality - and you are proposing to implement it here anyway - how
about we actually do document that we promise to return the first

non-null

value encountered by the aggregate. We can then direct people to this
function and just let them know to pretend the function is really named
first_value in the case where they specify an order by. (last_value comes
for basically free with descending sorting).

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

I'm assuming you are thinking of the second query form, while the guarantee
only needs to apply to the first.

David J.

#14Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#13)
Re: ANY_VALUE aggregate

On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org> wrote:

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Implementation-dependent. It is NOT implementation-defined, per spec.

We often loosen the spec rules when they don't make technical sense to
us, but I don't know of any example of when we have tightened them.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

I'm assuming you are thinking of the second query form, while the guarantee
only needs to apply to the first.

I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
bestow upon ANY_VALUE the ability to make those two queries equivalent.

If you care about which value you get back, use something else.
--
Vik Fearing

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#14)
Re: ANY_VALUE aggregate

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>

wrote:

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Implementation-dependent. It is NOT implementation-defined, per spec.

I really don't care all that much about the spec here given that ORDER BY

in an aggregate call is non-spec.

We often loosen the spec rules when they don't make technical sense to

us, but I don't know of any example of when we have tightened them.

The function has to choose some row from among its inputs, and the system
has to obey an order by specification added to the function call. You are
de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not. I'm just saying to be upfront
and honest about it - our users do want such a capability so maybe accept
that there is a first time for everything. Not that picking an
advantageous "implementation-dependent" implementation should be considered
deviating from the spec.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

I'm assuming you are thinking of the second query form, while the

guarantee

only needs to apply to the first.

I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
bestow upon ANY_VALUE the ability to make those two queries equivalent.

That theoretical idea should not be entertained. Removing a user's
explicitly added ORDER BY should be off-limits. Any approach at
optimization here should simply look at whether an ORDER BY is specified
and pass that information to the function. If the function itself really
believes that ordering matters it can emit its own runtime exception
stating that fact and the user can fix their query.

If you care about which value you get back, use something else.

There isn't a "something else" to use so that isn't presently an option.

I suppose it comes down to what level of belief and care you have that
people will simply mis-use this function if it is added in its current form
to get the desired first_value effect that it produces.

David J.

#16Pantelis Theodosiou
ypercube@gmail.com
In reply to: David G. Johnston (#13)
Re: ANY_VALUE aggregate

On Tue, Dec 6, 2022 at 4:57 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
...

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

Shouldn't the 2nd query be producing an error, as it has an implied
GROUP BY () - so column v cannot appear (unless aggregated) in SELECT
and ORDER BY?

#17David G. Johnston
david.g.johnston@gmail.com
In reply to: Pantelis Theodosiou (#16)
Re: ANY_VALUE aggregate

On Wed, Dec 7, 2022 at 1:58 AM Pantelis Theodosiou <ypercube@gmail.com>
wrote:

On Tue, Dec 6, 2022 at 4:57 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
...

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

Shouldn't the 2nd query be producing an error, as it has an implied
GROUP BY () - so column v cannot appear (unless aggregated) in SELECT
and ORDER BY?

Right, that should be written as:

select any_value(v) from (values (2),(1),(3) order by 1) as vals (v);

(you said SELECT; the discussion here is that any_value is going to be
added as a new aggregate function)

David J.

#18Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#15)
Re: ANY_VALUE aggregate

On 12/7/22 04:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>

wrote:

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Implementation-dependent. It is NOT implementation-defined, per spec.

I really don't care all that much about the spec here given that ORDER BY
in an aggregate call is non-spec.

Well, this is demonstrably wrong.

<array aggregate function> ::=
ARRAY_AGG <left paren>
<value expression>
[ ORDER BY <sort specification list> ]
<right paren>

We often loosen the spec rules when they don't make technical sense to
us, but I don't know of any example of when we have tightened them.

The function has to choose some row from among its inputs,

True.

and the system has to obey an order by specification added to the function call.

False.

You are de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not.

I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.

I'm just saying to be upfront
and honest about it - our users do want such a capability so maybe accept
that there is a first time for everything. Not that picking an
advantageous "implementation-dependent" implementation should be considered
deviating from the spec.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

I'm assuming you are thinking of the second query form, while the

guarantee

only needs to apply to the first.

I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
bestow upon ANY_VALUE the ability to make those two queries equivalent.

That theoretical idea should not be entertained. Removing a user's
explicitly added ORDER BY should be off-limits. Any approach at
optimization here should simply look at whether an ORDER BY is specified
and pass that information to the function. If the function itself really
believes that ordering matters it can emit its own runtime exception
stating that fact and the user can fix their query.

It absolutely should be entertained, and I plan on doing so in an
upcoming thread. Whether it errors or ignores is something that should
be discussed on that thread.

If you care about which value you get back, use something else.

There isn't a "something else" to use so that isn't presently an option.

The query

SELECT proposed_first_value(x ORDER BY y) FROM ...

is equivalent to

SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...

so I am not very sympathetic to your claim of "no other option".

I suppose it comes down to what level of belief and care you have that
people will simply mis-use this function if it is added in its current form
to get the desired first_value effect that it produces.

People who rely on explicitly undefined behavior get what they deserve
when the implementation changes.
--
Vik Fearing

#19David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#18)
Re: ANY_VALUE aggregate

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/7/22 04:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org>

wrote:

On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>

wrote:

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Implementation-dependent. It is NOT implementation-defined, per spec.

I really don't care all that much about the spec here given that ORDER BY
in an aggregate call is non-spec.

Well, this is demonstrably wrong.

<array aggregate function> ::=
ARRAY_AGG <left paren>
<value expression>
[ ORDER BY <sort specification list> ]
<right paren>

Demoable only by you and a few others...

We should update our documentation - the source of SQL Standard knowledge
for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.

You are de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not.

I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.

Yes, both statements seem true. At least until we decide to start ignoring
a user's explicit order by clause.

If you care about which value you get back, use something else.

There isn't a "something else" to use so that isn't presently an option.

The query

SELECT proposed_first_value(x ORDER BY y) FROM ...

is equivalent to

SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...

so I am not very sympathetic to your claim of "no other option".

Semantically, yes, in terms of performance, not so much, for any
non-trivial sized group.

I'm done, and apologize for getting too emotionally invested in this. I
hope to get others to voice enough +1s to get a first_value function into
core along-side this one (which makes the above discussion either moot or
deferred until there is a concrete use case for ignoring an explicit ORDER
BY). If that doesn't happen, well, it isn't going to make or break us
either way.

David J.

#20Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#19)
Re: ANY_VALUE aggregate

On 12/8/22 06:48, David G. Johnston wrote:

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/7/22 04:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org>

wrote:

On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>

wrote:

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Implementation-dependent. It is NOT implementation-defined, per spec.

I really don't care all that much about the spec here given that ORDER BY
in an aggregate call is non-spec.

Well, this is demonstrably wrong.

<array aggregate function> ::=
ARRAY_AGG <left paren>
<value expression>
[ ORDER BY <sort specification list> ]
<right paren>

Demoable only by you and a few others...

The standard is publicly available. It is strange that we, being so
open, hold ourselves to such a closed standard; but that is what we do.

We should update our documentation - the source of SQL Standard knowledge
for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.

Using DISTINCT in an aggregate is also standard. What that note is
saying is that the standard does not allow *both* to be used at the same
time.

The standard defines these things for specific aggregates whereas we are
much more generic about it and therefore have to deal with the combinations.

I have submitted a doc patch to clarify that.

You are de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not.

I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.

Yes, both statements seem true. At least until we decide to start ignoring
a user's explicit order by clause.

I ran some tests and including an ORDER BY in an aggregate that doesn't
care (like COUNT) is devastating for performance. I will be proposing a
solution to that soon and I invite you to participate in that
conversation when I do.
--
Vik Fearing

#21Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Vik Fearing (#7)
Re: ANY_VALUE aggregate

On 05.12.22 21:18, Vik Fearing wrote:

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value
from the rows in its group.

PFA an implementation of this aggregate.

Here is v2 of this patch.  I had forgotten to update sql_features.txt.

In your patch, the documentation says the definition is any_value("any")
but the catalog definitions are any_value(anyelement). Please sort that
out.

Since the transition function is declared strict, null values don't need
to be checked. I think the whole function could be reduced to

Datum
any_value_trans(PG_FUNCTION_ARGS)
{
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
}

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#21)
Re: ANY_VALUE aggregate

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 05.12.22 21:18, Vik Fearing wrote:

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value
from the rows in its group.

Since the transition function is declared strict, null values don't need
to be checked.

Hmm, but should it be strict? That means that what it's returning
is *not* "any value" but "any non-null value". What does the draft
spec have to say about that?

regards, tom lane

#23Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#22)
Re: ANY_VALUE aggregate

On 1/18/23 16:55, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 05.12.22 21:18, Vik Fearing wrote:

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value
from the rows in its group.

Since the transition function is declared strict, null values don't need
to be checked.

Hmm, but should it be strict? That means that what it's returning
is *not* "any value" but "any non-null value". What does the draft
spec have to say about that?

It falls into the same category as AVG() etc. That is, nulls are
removed before calculation.
--
Vik Fearing

#24Vik Fearing
vik@postgresfriends.org
In reply to: Peter Eisentraut (#21)
1 attachment(s)
Re: ANY_VALUE aggregate

On 1/18/23 16:06, Peter Eisentraut wrote:

On 05.12.22 21:18, Vik Fearing wrote:

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value
from the rows in its group.

PFA an implementation of this aggregate.

Here is v2 of this patch.  I had forgotten to update sql_features.txt.

In your patch, the documentation says the definition is any_value("any")
but the catalog definitions are any_value(anyelement).  Please sort that
out.

Since the transition function is declared strict, null values don't need
to be checked.

Thank you for the review. Attached is a new version rebased to d540a02a72.
--
Vik Fearing

Attachments:

0001-Implement-ANY_VALUE-aggregate.v03.patchtext/x-patch; charset=UTF-8; name=0001-Implement-ANY_VALUE-aggregate.v03.patchDownload
From 9cf2c5b56ea38d3080c0cb9f8ef9e6229d8696b4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 9 Apr 2022 00:07:38 +0200
Subject: [PATCH] Implement ANY_VALUE aggregate

SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
 doc/src/sgml/func.sgml                   | 14 ++++++++++++++
 src/backend/catalog/sql_features.txt     |  1 +
 src/backend/utils/adt/misc.c             | 13 +++++++++++++
 src/include/catalog/pg_aggregate.dat     |  4 ++++
 src/include/catalog/pg_proc.dat          |  8 ++++++++
 src/test/regress/expected/aggregates.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/aggregates.sql      |  6 ++++++
 7 files changed, 70 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b8dac9ef46..8ff9decfec 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19735,6 +19735,20 @@ SELECT NULLIF(value, '(none)') ...
      </thead>
 
      <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>any_value</primary>
+        </indexterm>
+        <function>any_value</function> ( <type>anyelement</type> )
+        <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+       </para>
+       <para>
+        Chooses a non-deterministic value from the non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index abad216b7e..dfd3882801 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -520,6 +520,7 @@ T622	Trigonometric functions			YES
 T623	General logarithm functions			YES	
 T624	Common logarithm functions			YES	
 T625	LISTAGG			NO	
+T626	ANY_VALUE			YES	
 T631	IN predicate with one list element			YES	
 T641	Multiple column assignment			NO	only some syntax variants supported
 T651	SQL-schema statements in SQL routines			YES	
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 220ddb8c01..a9251f977e 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -1041,3 +1041,16 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_NULL();
 }
+
+/*
+ * Transition function for the ANY_VALUE aggregate
+ *
+ * Currently this just returns the first value, but in the future it might be
+ * able to signal to the aggregate that it does not need to be called anymore.
+ */
+Datum
+any_value_trans(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 8c957437ea..aac60dee58 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -625,4 +625,8 @@
   aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
   aggmfinalmodify => 'w', aggtranstype => 'internal' },
 
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_trans',
+  aggcombinefn => 'any_value_trans', aggtranstype => 'anyelement' },
+
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 86eb8e8c58..95e760440e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11891,4 +11891,12 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8981', descr => 'arbitrary value from among input values',
+  proname => 'any_value', prokind => 'a', proisstrict => 'f',
+  prorettype => 'anyelement', proargtypes => 'anyelement',
+  prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+  proname => 'any_value_trans', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+  prosrc => 'any_value_trans' },
+
 ]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index ae3b905331..b240ef522b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -25,6 +25,24 @@ SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
  32.6666666666666667
 (1 row)
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value 
+-----------
+         1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value 
+-----------
+ 
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+   any_value   
+---------------
+ {hello,world}
+(1 row)
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
@@ -1910,6 +1928,12 @@ from (values ('a', 'b')) AS v(foo,bar);
  a
 (1 row)
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+ any_value 
+-----------
+         3
+(1 row)
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 514e3b2b39..4c0fd0d452 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -24,6 +24,10 @@ SELECT avg(four) AS avg_1 FROM onek;
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 
@@ -733,6 +737,8 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
 
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
-- 
2.34.1

#25Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Vik Fearing (#24)
Re: ANY_VALUE aggregate

On 18.01.23 18:01, Vik Fearing wrote:

On 1/18/23 16:06, Peter Eisentraut wrote:

On 05.12.22 21:18, Vik Fearing wrote:

On 12/5/22 15:57, Vik Fearing wrote:

The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
returns an implementation-dependent (i.e. non-deterministic) value
from the rows in its group.

PFA an implementation of this aggregate.

Here is v2 of this patch.  I had forgotten to update sql_features.txt.

In your patch, the documentation says the definition is
any_value("any") but the catalog definitions are
any_value(anyelement).  Please sort that out.

Since the transition function is declared strict, null values don't
need to be checked.

Thank you for the review.  Attached is a new version rebased to d540a02a72.

This looks good to me now.

#26David Rowley
dgrowleyml@gmail.com
In reply to: Vik Fearing (#24)
Re: ANY_VALUE aggregate

On Thu, 19 Jan 2023 at 06:01, Vik Fearing <vik@postgresfriends.org> wrote:

Thank you for the review. Attached is a new version rebased to d540a02a72.

I've only a bunch of nit-picks, personal preferences and random
thoughts to offer as a review:

1. I'd be inclined *not* to mention the possible future optimisation in:

+ * Currently this just returns the first value, but in the future it might be
+ * able to signal to the aggregate that it does not need to be called anymore.

I think it's unlikely that the transfn would "signal" such a thing. It
seems more likely if we did anything about it that nodeAgg.c would
maybe have some additional knowledge not to call that function if the
agg state already has a value. Just so we're not preempting how we
might do such a thing in the future, it seems best just to remove the
mention of it. I don't really think it serves as a good reminder that
we might want to do this one day anyway.

2. +any_value_trans(PG_FUNCTION_ARGS)

Many of transition function names end in "transfn", not "trans". I
think it's better to follow the existing (loosely followed) naming
pattern that a few aggregates seem to follow rather than invent a new
one.

3. I tend to try to copy the capitalisation of keywords from the
surrounding regression tests. I see the following breaks that.

+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);

(obviously, ideally, we'd always just follow the same capitalisation
of keywords everywhere in each .sql file, but we've long broken that
and the best way can do is be consistent with surrounding tests)

4. I think I'd use the word "Returns" instead of "Chooses" in:

+ Chooses a non-deterministic value from the non-null input values.

5. I've not managed to find a copy of the 2023 draft, so I'm assuming
you've got the ignoring of NULLs correct. I tried to see what other
databases do using https://www.db-fiddle.com/ . I was surprised to see
MySQL 8.0 returning NULL with:

create table a (a int, b int);
insert into a values(1,null),(1,2),(1,null);

select any_value(b) from a group by a;

I'd have expected "2" to be returned. (It gets even weirder without
the GROUP BY clause, so I'm not too hopeful any useful information can
be obtained from looking here)

I know MySQL doesn't follow the spec quite as closely as we do, so I
might not be that surprised if they didn't pay attention to the
wording when implementing this, however, I've not seen the spec, so I
can only speculate what value should be returned. Certainly not doing
any aggregation for any_value() when there is no GROUP BY seems
strange. I see they don't do the same with sum(). Perhaps this is just
a side effect of their loose standards when it came to columns in the
SELECT clause that are not in the GROUP BY clause.

6. Is it worth adding a WindowFunc test somewhere in window.sql with
an any_value(...) over (...)? Is what any_value() returns as a
WindowFunc equally as non-deterministic as when it's used as an
Aggref? Can we assume there's no guarantee that it'll return the same
value for each partition in each row? Does the spec mention anything
about that?

7. I wondered if it's worth adding a
SupportRequestOptimizeWindowClause support function for this
aggregate. I'm thinking that it might not be as likely people would
use something more specific like first_value/nth_value/last_value
instead of using any_value as a WindowFunc. Also, I'm currently
thinking that a SupportRequestWFuncMonotonic for any_value() is not
worth the dozen or so lines of code it would take to write it. I'm
assuming it would always be a MONOTONICFUNC_BOTH function. It seems
unlikely that someone would have a subquery with a WHERE clause in the
upper-level query referencing the any_value() aggregate. Thought I'd
mention both of these things anyway as someone else might think of
some good reason we should add them that I didn't think of.

David

#27Vik Fearing
vik@postgresfriends.org
In reply to: David Rowley (#26)
1 attachment(s)
Re: ANY_VALUE aggregate

On 1/23/23 08:50, David Rowley wrote:

On Thu, 19 Jan 2023 at 06:01, Vik Fearing <vik@postgresfriends.org> wrote:

Thank you for the review. Attached is a new version rebased to d540a02a72.

I've only a bunch of nit-picks, personal preferences and random
thoughts to offer as a review:

1. I'd be inclined *not* to mention the possible future optimisation in:

+ * Currently this just returns the first value, but in the future it might be
+ * able to signal to the aggregate that it does not need to be called anymore.

I think it's unlikely that the transfn would "signal" such a thing. It
seems more likely if we did anything about it that nodeAgg.c would
maybe have some additional knowledge not to call that function if the
agg state already has a value. Just so we're not preempting how we
might do such a thing in the future, it seems best just to remove the
mention of it. I don't really think it serves as a good reminder that
we might want to do this one day anyway.

Modified. My logic in having the transition function signal that it is
finished is to one day allow something like:

array_agg(x order by y limit z)

2. +any_value_trans(PG_FUNCTION_ARGS)

Many of transition function names end in "transfn", not "trans". I
think it's better to follow the existing (loosely followed) naming
pattern that a few aggregates seem to follow rather than invent a new
one.

Renamed.

3. I tend to try to copy the capitalisation of keywords from the
surrounding regression tests. I see the following breaks that.

+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);

(obviously, ideally, we'd always just follow the same capitalisation
of keywords everywhere in each .sql file, but we've long broken that
and the best way can do is be consistent with surrounding tests)

Downcased.

4. I think I'd use the word "Returns" instead of "Chooses" in:

+ Chooses a non-deterministic value from the non-null input values.

Done.

5. I've not managed to find a copy of the 2023 draft, so I'm assuming
you've got the ignoring of NULLs correct.

Yes, I do. This is part of <computational operation>, so SQL:2016 10.9
GR 7.a applies.

6. Is it worth adding a WindowFunc test somewhere in window.sql with
an any_value(...) over (...)? Is what any_value() returns as a
WindowFunc equally as non-deterministic as when it's used as an
Aggref? Can we assume there's no guarantee that it'll return the same
value for each partition in each row? Does the spec mention anything
about that?

This is governed by SQL:2016 10.9 GR 1.d and 1.e which defines the
source rows for the aggregate: either a group or a window frame. There
is no difference in behavior. I don't think a windowed test is useful
here unless I were to implement moving transitions. I think that might
be overkill for this function.

7. I wondered if it's worth adding a
SupportRequestOptimizeWindowClause support function for this
aggregate. I'm thinking that it might not be as likely people would
use something more specific like first_value/nth_value/last_value
instead of using any_value as a WindowFunc. Also, I'm currently
thinking that a SupportRequestWFuncMonotonic for any_value() is not
worth the dozen or so lines of code it would take to write it. I'm
assuming it would always be a MONOTONICFUNC_BOTH function. It seems
unlikely that someone would have a subquery with a WHERE clause in the
upper-level query referencing the any_value() aggregate. Thought I'd
mention both of these things anyway as someone else might think of
some good reason we should add them that I didn't think of.

I thought about this for a while and decided that it was not worthwhile.

v4 attached. I am putting this back to Needs Review in the commitfest
app, but these changes were editorial so it is probably RfC like Peter
had set it. I will let you be the judge of that.
--
Vik Fearing

Attachments:

v4-0001-Implement-ANY_VALUE-aggregate.patchtext/x-patch; charset=UTF-8; name=v4-0001-Implement-ANY_VALUE-aggregate.patchDownload
From 410751cfa6367e5436e20011f3f47f37888190a1 Mon Sep 17 00:00:00 2001
From: Vik Fearing <vik@postgresfriends.org>
Date: Thu, 9 Feb 2023 10:37:10 +0100
Subject: [PATCH v4] Implement ANY_VALUE aggregate

SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
 doc/src/sgml/func.sgml                   | 14 ++++++++++++++
 src/backend/catalog/sql_features.txt     |  1 +
 src/backend/utils/adt/misc.c             | 10 ++++++++++
 src/include/catalog/pg_aggregate.dat     |  4 ++++
 src/include/catalog/pg_proc.dat          |  8 ++++++++
 src/test/regress/expected/aggregates.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/aggregates.sql      |  6 ++++++
 7 files changed, 67 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..8bdef6eb32 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19735,6 +19735,20 @@ SELECT NULLIF(value, '(none)') ...
      </thead>
 
      <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>any_value</primary>
+        </indexterm>
+        <function>any_value</function> ( <type>anyelement</type> )
+        <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+       </para>
+       <para>
+        Returns a non-deterministic value from the non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..0eb905c177 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -520,6 +520,7 @@ T622	Trigonometric functions			YES
 T623	General logarithm functions			YES	
 T624	Common logarithm functions			YES	
 T625	LISTAGG			NO	
+T626	ANY_VALUE			YES	
 T631	IN predicate with one list element			YES	
 T641	Multiple column assignment			NO	only some syntax variants supported
 T651	SQL-schema statements in SQL routines			YES	
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 220ddb8c01..0f0010b06d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -1041,3 +1041,13 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_NULL();
 }
+
+/*
+ * Transition function for the ANY_VALUE aggregate
+ */
+Datum
+any_value_transfn(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 4fea7d8dc1..d7895cd676 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -634,4 +634,8 @@
   aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
   aggmfinalmodify => 'w', aggtranstype => 'internal' },
 
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_transfn',
+  aggcombinefn => 'any_value_transfn', aggtranstype => 'anyelement' },
+
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..d03a72daf0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11918,4 +11918,12 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8981', descr => 'arbitrary value from among input values',
+  proname => 'any_value', prokind => 'a', proisstrict => 'f',
+  prorettype => 'anyelement', proargtypes => 'anyelement',
+  prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+  proname => 'any_value_transfn', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+  prosrc => 'any_value_transfn' },
+
 ]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 82d0961524..95bf49f3d9 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -25,6 +25,24 @@ SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
  32.6666666666666667
 (1 row)
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value 
+-----------
+         1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value 
+-----------
+ 
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+   any_value   
+---------------
+ {hello,world}
+(1 row)
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
@@ -2025,6 +2043,12 @@ from (values ('a', 'b')) AS v(foo,bar);
  a
 (1 row)
 
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+ any_value 
+-----------
+         3
+(1 row)
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index e81a22465b..e83c984ed0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -24,6 +24,10 @@ SELECT avg(four) AS avg_1 FROM onek;
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 
@@ -806,6 +810,8 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
 
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))

base-commit: ef7002dbe0b06e4e5b42c89becd4eb9be2e9aa89
-- 
2.34.1

#28Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Vik Fearing (#27)
Re: ANY_VALUE aggregate

I could have used such an aggregate in the past, so +1.

This is maybe getting into nit-picking, but perhaps it should be
documented as returning an "arbitrary" value instead of a
"non-deterministic" one? Technically the value is deterministic:
there's a concrete algorithm specifying how it's selected. However,
the algorithm is reserved as an implementation detail, since the
function is designed for cases in which the caller should not care
which value is returned.

Thanks,
Maciek

#29Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Vik Fearing (#27)
Re: ANY_VALUE aggregate

On 09.02.23 10:42, Vik Fearing wrote:

v4 attached.  I am putting this back to Needs Review in the commitfest
app, but these changes were editorial so it is probably RfC like Peter
had set it.  I will let you be the judge of that.

I have committed this.

I made a few small last-minute tweaks:

- Changed "non-deterministic" to "arbitrary", as suggested by Maciek
Sakrejda nearby. This seemed like a handier and less jargony term.

- Removed trailing whitespace in misc.c.

- Changed the function description in pg_proc.dat. Apparently, we are
using 'aggregate transition function' there for all aggregate functions
(instead of 'any_value transition function' etc.).

- Made the tests a bit more interested by feeding in more rows and a mix
of null and nonnull values.