[patch] bit XOR aggregate functions

Started by Alexey Bashtanovalmost 5 years ago24 messages
#1Alexey Bashtanov
bashtanov@imap.cc
1 attachment(s)

Hi,

I personally use it as a checksum for a large unordered set, where
performance and simplicity is prioritized over collision resilience.
Maybe there are other ways to use them.

Best, Alex

Attachments:

bit-xor-agg-v001.difftext/x-patch; charset=UTF-8; name=bit-xor-agg-v001.diffDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9056..f33358f8db 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19192,6 +19192,32 @@ SELECT NULLIF(value, '(none)') ...
        <entry>Yes</entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>bit_xor</primary>
+        </indexterm>
+        <function>bit_xor</function> ( <type>smallint</type> )
+        <returnvalue>smallint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>integer</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bigint</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bit</type> )
+        <returnvalue>bit</returnvalue>
+       </para>
+       <para>
+        Computes the bitwise exclusive OR of all non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 5c1f962251..0d8c5a922a 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -505,18 +505,26 @@
   aggcombinefn => 'int2and', aggtranstype => 'int2' },
 { aggfnoid => 'bit_or(int2)', aggtransfn => 'int2or', aggcombinefn => 'int2or',
   aggtranstype => 'int2' },
+{ aggfnoid => 'bit_xor(int2)', aggtransfn => 'int2xor', aggcombinefn => 'int2xor',
+  aggtranstype => 'int2' },
 { aggfnoid => 'bit_and(int4)', aggtransfn => 'int4and',
   aggcombinefn => 'int4and', aggtranstype => 'int4' },
 { aggfnoid => 'bit_or(int4)', aggtransfn => 'int4or', aggcombinefn => 'int4or',
   aggtranstype => 'int4' },
+{ aggfnoid => 'bit_xor(int4)', aggtransfn => 'int4xor', aggcombinefn => 'int4xor',
+  aggtranstype => 'int4' },
 { aggfnoid => 'bit_and(int8)', aggtransfn => 'int8and',
   aggcombinefn => 'int8and', aggtranstype => 'int8' },
 { aggfnoid => 'bit_or(int8)', aggtransfn => 'int8or', aggcombinefn => 'int8or',
   aggtranstype => 'int8' },
+{ aggfnoid => 'bit_xor(int8)', aggtransfn => 'int8xor', aggcombinefn => 'int8xor',
+  aggtranstype => 'int8' },
 { aggfnoid => 'bit_and(bit)', aggtransfn => 'bitand', aggcombinefn => 'bitand',
   aggtranstype => 'bit' },
 { aggfnoid => 'bit_or(bit)', aggtransfn => 'bitor', aggcombinefn => 'bitor',
   aggtranstype => 'bit' },
+{ aggfnoid => 'bit_xor(bit)', aggtransfn => 'bitxor', aggcombinefn => 'bitxor',
+  aggtranstype => 'bit' },
 
 # xml
 { aggfnoid => 'xmlagg', aggtransfn => 'xmlconcat2', aggtranstype => 'xml' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..95577094bb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7992,24 +7992,36 @@
 { oid => '2237', descr => 'bitwise-or smallint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
+{ oid => '8452', descr => 'bitwise-xor smallint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2238', descr => 'bitwise-and integer aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int4', proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2239', descr => 'bitwise-or integer aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
+{ oid => '8453', descr => 'bitwise-xor integer aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2240', descr => 'bitwise-and bigint aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int8', proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2241', descr => 'bitwise-or bigint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+{ oid => '8454', descr => 'bitwise-xor bigint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2242', descr => 'bitwise-and bit aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 { oid => '2243', descr => 'bitwise-or bit aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
+{ oid => '8455', descr => 'bitwise-xor bit aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
+  proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 
 # formerly-missing interval + datetime operators
 { oid => '2546',
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 55cab4d2bf..806eabac81 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -899,7 +899,7 @@
 /* Define to select named POSIX semaphores. */
 #undef USE_NAMED_POSIX_SEMAPHORES
 
-/* Define to build with OpenSSL support. (--with-ssl=openssl) */
+/* Define to 1 if you have OpenSSL support. */
 #undef USE_OPENSSL
 
 /* Define to 1 to build with PAM support. (--with-pam) */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..2c818d9253 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -742,11 +742,12 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
- ? | ? 
----+---
-   |  
+ ? | ? | ? 
+---+---+---
+   |   |  
 (1 row)
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -762,11 +763,17 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
- 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 
----+---+---+---+---+------+---+---+---+---+---+------
- 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101
+ 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 | 5 | 5 | 5 | ? | 7 | 1101 
+---+---+---+---+---+------+---+---+---+---+---+------+---+---+---+---+---+------
+ 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101 | 5 | 5 | 5 | 2 | 7 | 1101
 (1 row)
 
 --
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..f9579af19a 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -212,7 +212,8 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -234,7 +235,14 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
 
 --
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alexey Bashtanov (#1)
Re: [patch] bit XOR aggregate functions

At Tue, 9 Feb 2021 15:25:19 +0000, Alexey Bashtanov <bashtanov@imap.cc> wrote in

I personally use it as a checksum for a large unordered set, where
performance and simplicity is prioritized over collision resilience.
Maybe there are other ways to use them.

FWIW the BIT_XOR can be created using CREATE AGGREGATE.

CREATE OR REPLACE AGGREGATE BIT_XOR(IN v smallint) (SFUNC = int2xor, STYPE = smallint);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v int4) (SFUNC = int4xor, STYPE = int4);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v bigint) (SFUNC = int8xor, STYPE = bigint);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v bit) (SFUNC = bitxor, STYPE = bit);

The bit_and/bit_or aggregates are back to 2004, that commit says that:

commit 8096fe45cee42ce02e602cbea08e969139a77455
Author: Bruce Momjian <bruce@momjian.us>
Date: Wed May 26 15:26:28 2004 +0000

...

(2) bitwise integer aggregates named bit_and and bit_or for
int2, int4, int8 and bit types. They are not standard, but I find
them useful. I needed them once.

We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Kyotaro Horiguchi (#2)
Re: [patch] bit XOR aggregate functions

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself. But it can be used as a checksum, so it seems
pretty reasonable to me to add it. Perhaps the use case could be
pointed out in the documentation.

#4Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Peter Eisentraut (#3)
1 attachment(s)
Re: [patch] bit XOR aggregate functions

On Wed, Mar 3, 2021 at 7:30 PM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself. But it can be used as a checksum, so it seems
pretty reasonable to me to add it. Perhaps the use case could be
pointed out in the documentation.

Hi Alex,

The patch is failing just because of a comment, which is already changed by
another patch

-/* Define to build with OpenSSL support. (--with-ssl=openssl) */

+/* Define to 1 if you have OpenSSL support. */

Do you mind sending an updated patch?

http://cfbot.cputube.org/patch_32_2980.log.

I am changing the status to "Waiting for Author"

In my opinion that change no more requires so I removed that and attached
the patch.

--
Ibrar Ahmed

Attachments:

bit-xor-agg-v002.diffapplication/octet-stream; name=bit-xor-agg-v002.diffDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9056..f33358f8db 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19192,6 +19192,32 @@ SELECT NULLIF(value, '(none)') ...
        <entry>Yes</entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>bit_xor</primary>
+        </indexterm>
+        <function>bit_xor</function> ( <type>smallint</type> )
+        <returnvalue>smallint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>integer</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bigint</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bit</type> )
+        <returnvalue>bit</returnvalue>
+       </para>
+       <para>
+        Computes the bitwise exclusive OR of all non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 5c1f962251..0d8c5a922a 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -505,18 +505,26 @@
   aggcombinefn => 'int2and', aggtranstype => 'int2' },
 { aggfnoid => 'bit_or(int2)', aggtransfn => 'int2or', aggcombinefn => 'int2or',
   aggtranstype => 'int2' },
+{ aggfnoid => 'bit_xor(int2)', aggtransfn => 'int2xor', aggcombinefn => 'int2xor',
+  aggtranstype => 'int2' },
 { aggfnoid => 'bit_and(int4)', aggtransfn => 'int4and',
   aggcombinefn => 'int4and', aggtranstype => 'int4' },
 { aggfnoid => 'bit_or(int4)', aggtransfn => 'int4or', aggcombinefn => 'int4or',
   aggtranstype => 'int4' },
+{ aggfnoid => 'bit_xor(int4)', aggtransfn => 'int4xor', aggcombinefn => 'int4xor',
+  aggtranstype => 'int4' },
 { aggfnoid => 'bit_and(int8)', aggtransfn => 'int8and',
   aggcombinefn => 'int8and', aggtranstype => 'int8' },
 { aggfnoid => 'bit_or(int8)', aggtransfn => 'int8or', aggcombinefn => 'int8or',
   aggtranstype => 'int8' },
+{ aggfnoid => 'bit_xor(int8)', aggtransfn => 'int8xor', aggcombinefn => 'int8xor',
+  aggtranstype => 'int8' },
 { aggfnoid => 'bit_and(bit)', aggtransfn => 'bitand', aggcombinefn => 'bitand',
   aggtranstype => 'bit' },
 { aggfnoid => 'bit_or(bit)', aggtransfn => 'bitor', aggcombinefn => 'bitor',
   aggtranstype => 'bit' },
+{ aggfnoid => 'bit_xor(bit)', aggtransfn => 'bitxor', aggcombinefn => 'bitxor',
+  aggtranstype => 'bit' },
 
 # xml
 { aggfnoid => 'xmlagg', aggtransfn => 'xmlconcat2', aggtranstype => 'xml' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..95577094bb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7992,24 +7992,36 @@
 { oid => '2237', descr => 'bitwise-or smallint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
+{ oid => '8452', descr => 'bitwise-xor smallint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2238', descr => 'bitwise-and integer aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int4', proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2239', descr => 'bitwise-or integer aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
+{ oid => '8453', descr => 'bitwise-xor integer aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2240', descr => 'bitwise-and bigint aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int8', proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2241', descr => 'bitwise-or bigint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+{ oid => '8454', descr => 'bitwise-xor bigint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2242', descr => 'bitwise-and bit aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 { oid => '2243', descr => 'bitwise-or bit aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
+{ oid => '8455', descr => 'bitwise-xor bit aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
+  proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 
 # formerly-missing interval + datetime operators
 { oid => '2546',
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..2c818d9253 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -742,11 +742,12 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
- ? | ? 
----+---
-   |  
+ ? | ? | ? 
+---+---+---
+   |   |  
 (1 row)
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -762,11 +763,17 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
- 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 
----+---+---+---+---+------+---+---+---+---+---+------
- 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101
+ 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 | 5 | 5 | 5 | ? | 7 | 1101 
+---+---+---+---+---+------+---+---+---+---+---+------+---+---+---+---+---+------
+ 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101 | 5 | 5 | 5 | 2 | 7 | 1101
 (1 row)
 
 --
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..f9579af19a 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -212,7 +212,8 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -234,7 +235,14 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
 
 --
#5Alexey Bashtanov
bashtanov@imap.cc
In reply to: Ibrar Ahmed (#4)
1 attachment(s)
Re: [patch] bit XOR aggregate functions

Hi all,

Thanks for your reviews.
I've updated my patch to the current master and added a documentation
line suggesting using the new function as a checksum.

Best regards, Alex

Show quoted text

On 04/03/2021 17:14, Ibrar Ahmed wrote:

On Wed, Mar 3, 2021 at 7:30 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be

thought

as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. 
BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself.  But it can be used as a checksum, so it
seems
pretty reasonable to me to add it.  Perhaps the use case could be
pointed out in the documentation.

Hi Alex,

The patch is failing just because of a comment, which is already
changed by another patch

-/* Define to build with OpenSSL support. (--with-ssl=openssl) */

+/* Define to 1 if you have OpenSSL support. */

Do you mind sending an updated patch?

http://cfbot.cputube.org/patch_32_2980.log.

I am changing the status to "Waiting for Author"

In my opinion that change no more requires so I removed that and
attached the patch.

--
Ibrar Ahmed

Attachments:

bit-xor-agg-v003.difftext/x-patch; charset=UTF-8; name=bit-xor-agg-v003.diffDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..dff158e99a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19198,6 +19198,33 @@ SELECT NULLIF(value, '(none)') ...
        <entry>Yes</entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>bit_xor</primary>
+        </indexterm>
+        <function>bit_xor</function> ( <type>smallint</type> )
+        <returnvalue>smallint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>integer</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bigint</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para role="func_signature">
+        <function>bit_xor</function> ( <type>bit</type> )
+        <returnvalue>bit</returnvalue>
+       </para>
+       <para>
+        Computes the bitwise exclusive OR of all non-null input values.
+        May be useful as a checksum for an unordered set of values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 5c1f962251..0d8c5a922a 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -505,18 +505,26 @@
   aggcombinefn => 'int2and', aggtranstype => 'int2' },
 { aggfnoid => 'bit_or(int2)', aggtransfn => 'int2or', aggcombinefn => 'int2or',
   aggtranstype => 'int2' },
+{ aggfnoid => 'bit_xor(int2)', aggtransfn => 'int2xor', aggcombinefn => 'int2xor',
+  aggtranstype => 'int2' },
 { aggfnoid => 'bit_and(int4)', aggtransfn => 'int4and',
   aggcombinefn => 'int4and', aggtranstype => 'int4' },
 { aggfnoid => 'bit_or(int4)', aggtransfn => 'int4or', aggcombinefn => 'int4or',
   aggtranstype => 'int4' },
+{ aggfnoid => 'bit_xor(int4)', aggtransfn => 'int4xor', aggcombinefn => 'int4xor',
+  aggtranstype => 'int4' },
 { aggfnoid => 'bit_and(int8)', aggtransfn => 'int8and',
   aggcombinefn => 'int8and', aggtranstype => 'int8' },
 { aggfnoid => 'bit_or(int8)', aggtransfn => 'int8or', aggcombinefn => 'int8or',
   aggtranstype => 'int8' },
+{ aggfnoid => 'bit_xor(int8)', aggtransfn => 'int8xor', aggcombinefn => 'int8xor',
+  aggtranstype => 'int8' },
 { aggfnoid => 'bit_and(bit)', aggtransfn => 'bitand', aggcombinefn => 'bitand',
   aggtranstype => 'bit' },
 { aggfnoid => 'bit_or(bit)', aggtransfn => 'bitor', aggcombinefn => 'bitor',
   aggtranstype => 'bit' },
+{ aggfnoid => 'bit_xor(bit)', aggtransfn => 'bitxor', aggcombinefn => 'bitxor',
+  aggtranstype => 'bit' },
 
 # xml
 { aggfnoid => 'xmlagg', aggtransfn => 'xmlconcat2', aggtranstype => 'xml' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 59d2b71ca9..506689d8ac 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7995,24 +7995,36 @@
 { oid => '2237', descr => 'bitwise-or smallint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
+{ oid => '8452', descr => 'bitwise-xor smallint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2238', descr => 'bitwise-and integer aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int4', proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2239', descr => 'bitwise-or integer aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
+{ oid => '8453', descr => 'bitwise-xor integer aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2240', descr => 'bitwise-and bigint aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f',
   prorettype => 'int8', proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2241', descr => 'bitwise-or bigint aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+{ oid => '8454', descr => 'bitwise-xor bigint aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2242', descr => 'bitwise-and bit aggregate',
   proname => 'bit_and', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 { oid => '2243', descr => 'bitwise-or bit aggregate',
   proname => 'bit_or', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
   proargtypes => 'bit', prosrc => 'aggregate_dummy' },
+{ oid => '8455', descr => 'bitwise-xor bit aggregate',
+  proname => 'bit_xor', prokind => 'a', proisstrict => 'f', prorettype => 'bit',
+  proargtypes => 'bit', prosrc => 'aggregate_dummy' },
 
 # formerly-missing interval + datetime operators
 { oid => '2546',
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..2c818d9253 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -742,11 +742,12 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
- ? | ? 
----+---
-   |  
+ ? | ? | ? 
+---+---+---
+   |   |  
 (1 row)
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -762,11 +763,17 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
- 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 
----+---+---+---+---+------+---+---+---+---+---+------
- 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101
+ 1 | 1 | 1 | ? | 0 | 0100 | 7 | 7 | 7 | ? | 7 | 1101 | 5 | 5 | 5 | ? | 7 | 1101 
+---+---+---+---+---+------+---+---+---+---+---+------+---+---+---+---+---+------
+ 1 | 1 | 1 | 1 | 0 | 0100 | 7 | 7 | 7 | 3 | 7 | 1101 | 5 | 5 | 5 | 2 | 7 | 1101
 (1 row)
 
 --
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..f9579af19a 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -212,7 +212,8 @@ CREATE TEMPORARY TABLE bitwise_test(
 -- empty case
 SELECT
   BIT_AND(i2) AS "?",
-  BIT_OR(i4)  AS "?"
+  BIT_OR(i4)  AS "?",
+  BIT_XOR(i8) AS "?"
 FROM bitwise_test;
 
 COPY bitwise_test FROM STDIN NULL 'null';
@@ -234,7 +235,14 @@ SELECT
   BIT_OR(i8)  AS "7",
   BIT_OR(i)   AS "?",
   BIT_OR(x)   AS "7",
-  BIT_OR(y)   AS "1101"
+  BIT_OR(y)   AS "1101",
+
+  BIT_XOR(i2) AS "5",
+  BIT_XOR(i4) AS "5",
+  BIT_XOR(i8) AS "5",
+  BIT_XOR(i)  AS "?",
+  BIT_XOR(x)  AS "7",
+  BIT_XOR(y)  AS "1101"
 FROM bitwise_test;
 
 --
#6Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Alexey Bashtanov (#5)
Re: [patch] bit XOR aggregate functions

On 05.03.21 13:42, Alexey Bashtanov wrote:

Thanks for your reviews.
I've updated my patch to the current master and added a documentation
line suggesting using the new function as a checksum.

committed

#7David Fetter
david@fetter.org
In reply to: Peter Eisentraut (#3)
Re: [patch] bit XOR aggregate functions

On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Vik Fearing
vik@postgresfriends.org
In reply to: David Fetter (#7)
Re: [patch] bit XOR aggregate functions

On 3/6/21 8:55 PM, David Fetter wrote:

On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?
--
Vik Fearing

#9David Fetter
david@fetter.org
In reply to: Vik Fearing (#8)
Re: [patch] bit XOR aggregate functions

On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:

On 3/6/21 8:55 PM, David Fetter wrote:

On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#10Vik Fearing
vik@postgresfriends.org
In reply to: David Fetter (#9)
Re: [patch] bit XOR aggregate functions

On 3/6/21 9:00 PM, David Fetter wrote:

On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:

On 3/6/21 8:55 PM, David Fetter wrote:

On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

I think one of us is forgetting how XOR works.
--
Vik Fearing

#11David Fetter
david@fetter.org
In reply to: Vik Fearing (#10)
Re: [patch] bit XOR aggregate functions

On Sat, Mar 06, 2021 at 09:03:25PM +0100, Vik Fearing wrote:

On 3/6/21 9:00 PM, David Fetter wrote:

On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:

On 3/6/21 8:55 PM, David Fetter wrote:

On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:

On 10.02.21 06:42, Kyotaro Horiguchi wrote:

We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?

I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

I think one of us is forgetting how XOR works.

Oops. You're right.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: David Fetter (#9)
Re: [patch] bit XOR aggregate functions

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we want
to do (and also maybe make doing so the rule going forward)?

David J.

#13Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#12)
Re: [patch] bit XOR aggregate functions

On 3/6/21 9:06 PM, David G. Johnston wrote:

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we want
to do (and also maybe make doing so the rule going forward)?

Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.
--
Vik Fearing

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#13)
Re: [patch] bit XOR aggregate functions

ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/6/21 9:06 PM, David G. Johnston wrote:

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we

want

to do (and also maybe make doing so the rule going forward)?

Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.

I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some stupid
things.

It is important at this time, because complexity in IT is pretty high, and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about costs
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it has
sense.

Regards

Pavel

--

Show quoted text

Vik Fearing

#15Vik Fearing
vik@postgresfriends.org
In reply to: Pavel Stehule (#14)
Re: [patch] bit XOR aggregate functions

On 3/7/21 10:53 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/6/21 9:06 PM, David G. Johnston wrote:

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we

want

to do (and also maybe make doing so the rule going forward)?

Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.

I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some stupid
things.

It is important at this time, because complexity in IT is pretty high, and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about costs
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it has
sense.

How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?

If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.
--
Vik Fearing

#16Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#15)
Re: [patch] bit XOR aggregate functions

ne 7. 3. 2021 v 11:02 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 10:53 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/6/21 9:06 PM, David G. Johnston wrote:

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we

want

to do (and also maybe make doing so the rule going forward)?

Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.

I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some

stupid

things.

It is important at this time, because complexity in IT is pretty high,

and

a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about

costs

and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it

has

sense.

How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?

If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.

The safeguard is mandatory ORDER BY clause.

--

Show quoted text

Vik Fearing

#17Vik Fearing
vik@postgresfriends.org
In reply to: Pavel Stehule (#16)
Re: [patch] bit XOR aggregate functions

On 3/7/21 11:05 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 11:02 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 10:53 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/6/21 9:06 PM, David G. Johnston wrote:

On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:

SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */

Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we

want

to do (and also maybe make doing so the rule going forward)?

Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.

I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some

stupid

things.

It is important at this time, because complexity in IT is pretty high,

and

a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about

costs

and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it

has

sense.

How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?

If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.

The safeguard is mandatory ORDER BY clause.

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.
--
Vik Fearing

#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#17)
Re: [patch] bit XOR aggregate functions

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.

The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is one
number and is not possible to do a visual check of correctness (like
median).

--

Show quoted text

Vik Fearing

#19Vik Fearing
vik@postgresfriends.org
In reply to: Pavel Stehule (#18)
Re: [patch] bit XOR aggregate functions

On 3/7/21 11:24 AM, Pavel Stehule wrote:

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.

The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is one
number and is not possible to do a visual check of correctness (like
median).

The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().
--
Vik Fearing

#20Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#19)
Re: [patch] bit XOR aggregate functions

ne 7. 3. 2021 v 11:28 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 11:24 AM, Pavel Stehule wrote:

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.

The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is

one

number and is not possible to do a visual check of correctness (like
median).

The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().

array_agg is discuttable, because PostgreSQL arrays are ordered set type.
But very common usage is using arrays instead and unordered sets (because
ANSI/SQL sets) are not supported. But anyway - for arrays I can do visual
check if it is ordered well or not.

--

Show quoted text

Vik Fearing

#21Vik Fearing
vik@postgresfriends.org
In reply to: Pavel Stehule (#20)
Re: [patch] bit XOR aggregate functions

On 3/7/21 11:37 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 11:28 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 11:24 AM, Pavel Stehule wrote:

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.

The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is

one

number and is not possible to do a visual check of correctness (like
median).

The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().

array_agg is discuttable, because PostgreSQL arrays are ordered set type.
But very common usage is using arrays instead and unordered sets (because
ANSI/SQL sets) are not supported. But anyway - for arrays I can do visual
check if it is ordered well or not.

If by "visual check" you mean "with my human eyeballs" then I would
argue that that is always the case and we don't need nannying for other
aggregates either.
--
Vik Fearing

#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#21)
Re: [patch] bit XOR aggregate functions

ne 7. 3. 2021 v 12:39 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 11:37 AM, Pavel Stehule wrote:

ne 7. 3. 2021 v 11:28 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:

On 3/7/21 11:24 AM, Pavel Stehule wrote:

And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.

The mandatory ORDER BY clause should be necessary for operations when

the

result depends on the order. You need an order for calculation of

median.

And you don't need to know an order for average. More if the result is

one

number and is not possible to do a visual check of correctness (like
median).

The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().

array_agg is discuttable, because PostgreSQL arrays are ordered set type.
But very common usage is using arrays instead and unordered sets (because
ANSI/SQL sets) are not supported. But anyway - for arrays I can do visual
check if it is ordered well or not.

If by "visual check" you mean "with my human eyeballs" then I would
argue that that is always the case and we don't need nannying for other
aggregates either.

The correct solution is using arrays like arrays and sets like sets. When
you mix two different features to one, then you will have problems.

But if I see {{1,2,3},{3,4,5}} I have some knowledge - it is not 100%, but
it is. If I have 27373 as a result of median, I have nothing other
information.

The design of arrays (in pg) was incremental - it is older than Postgres
supported ordered aggregates, and probably older than ANSI/SQL introduced
sets. So the implementation of strong safeguards is not possible for
compatibility reasons. If I designed array_agg or string_agg today, then I
prefer to design it like ordered aggregates.

Sure - it is about life philosophy, and it is about projects where you are,
and about risks, .. some people prefer risks, some people prefer
safeguards. I see a complexity boom as a very big issue - I remember good
books about programming on 50 pagers, and then now we should start from
green or zero again or we have to implement most safeguards that are
possible to hold systems workable. But anyway - a good system is robust,
and robust systems try to reduce possible errors how it is possible (human
errors are most common).

But this is offtopic in this discussion :)

--

Show quoted text

Vik Fearing

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#22)
Re: [patch] bit XOR aggregate functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

But this is offtopic in this discussion :)

The whole topic is off-topic. As a general rule, things that depend on
input order shouldn't be declared as aggregates --- they should be window
functions or ordered-set aggregates, for which the syntax forces you to
specify input order. All of the standard aggregates, and most of our
custom ones (including BIT_XOR) do not depend on input order (... mumble
floating-point roundoff error mumble ...), so forcing users to write an
ordering clause would be useless, not to mention being a SQL spec
violation.

There are a small minority like array_agg that do have such a dependency,
but as far as I recall our docs for each of those warn about the need to
sort the input for reproducible results. I think that's sufficient.
Who's to say whether a particular query actually requires reproducible
results? Seeing that we don't provide reproducible row ordering
without an ORDER BY, I'm not sure why we should apply a different
standard to array_agg.

regards, tom lane

#24David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#18)
Re: [patch] bit XOR aggregate functions

On Sun, 7 Mar 2021 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:
The mandatory ORDER BY clause should be necessary for operations when
the result depends on the order. You need an order for calculation of
median. And you don't need to know an order for average. More if the
result is one number and is not possible to do a visual check of
correctness (like median).

I really don't think so.

# create table f (f float not null);
# insert into f values(1e100),(-1e100),(1.5);
# select sum(f order by f) from f;
sum
-----
0
(1 row)

# select sum(f) from f;
sum
-----
1.5
(1 row)

Users are going to be pretty annoyed with us if we demanded that they
include an ORDER BY for that query. Especially so since our ORDER BY
aggregate implementation still has no planner support.

David