[patch] bit XOR aggregate functions
Hi,
I personally use it as a checksum for a large unordered set, where
performance and simplicity is prioritized over collision resilience.
Maybe there are other ways to use them.
Best, Alex
Attachments:
bit-xor-agg-v001.difftext/x-patch; charset=UTF-8; name=bit-xor-agg-v001.diffDownload+72-9
At Tue, 9 Feb 2021 15:25:19 +0000, Alexey Bashtanov <bashtanov@imap.cc> wrote in
I personally use it as a checksum for a large unordered set, where
performance and simplicity is prioritized over collision resilience.
Maybe there are other ways to use them.
FWIW the BIT_XOR can be created using CREATE AGGREGATE.
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v smallint) (SFUNC = int2xor, STYPE = smallint);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v int4) (SFUNC = int4xor, STYPE = int4);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v bigint) (SFUNC = int8xor, STYPE = bigint);
CREATE OR REPLACE AGGREGATE BIT_XOR(IN v bit) (SFUNC = bitxor, STYPE = bit);
The bit_and/bit_or aggregates are back to 2004, that commit says that:
commit 8096fe45cee42ce02e602cbea08e969139a77455
Author: Bruce Momjian <bruce@momjian.us>
Date: Wed May 26 15:26:28 2004 +0000
...
(2) bitwise integer aggregates named bit_and and bit_or for
int2, int4, int8 and bit types. They are not standard, but I find
them useful. I needed them once.
We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?
I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself. But it can be used as a checksum, so it seems
pretty reasonable to me to add it. Perhaps the use case could be
pointed out in the documentation.
On Wed, Mar 3, 2021 at 7:30 PM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set. BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself. But it can be used as a checksum, so it seems
pretty reasonable to me to add it. Perhaps the use case could be
pointed out in the documentation.
Hi Alex,
The patch is failing just because of a comment, which is already changed by
another patch
-/* Define to build with OpenSSL support. (--with-ssl=openssl) */
+/* Define to 1 if you have OpenSSL support. */
Do you mind sending an updated patch?
http://cfbot.cputube.org/patch_32_2980.log.
I am changing the status to "Waiting for Author"
In my opinion that change no more requires so I removed that and attached
the patch.
--
Ibrar Ahmed
Attachments:
bit-xor-agg-v002.diffapplication/octet-stream; name=bit-xor-agg-v002.diffDownload+71-8
Hi all,
Thanks for your reviews.
I've updated my patch to the current master and added a documentation
line suggesting using the new function as a checksum.
Best regards, Alex
Show quoted text
On 04/03/2021 17:14, Ibrar Ahmed wrote:
On Wed, Mar 3, 2021 at 7:30 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought
as it falls into the same category with BIT_AND and BIT_OR, that is,
we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and BIT_OR.
The latter give you an "all" or "any" result of the bits set.
BIT_XOR
will return 1 or true if an odd number of inputs are 1 or true, which
isn't useful by itself. But it can be used as a checksum, so it
seems
pretty reasonable to me to add it. Perhaps the use case could be
pointed out in the documentation.Hi Alex,
The patch is failing just because of a comment, which is already
changed by another patch-/* Define to build with OpenSSL support. (--with-ssl=openssl) */
+/* Define to 1 if you have OpenSSL support. */
Do you mind sending an updated patch?
http://cfbot.cputube.org/patch_32_2980.log.
I am changing the status to "Waiting for Author"
In my opinion that change no more requires so I removed that and
attached the patch.--
Ibrar Ahmed
Attachments:
bit-xor-agg-v003.difftext/x-patch; charset=UTF-8; name=bit-xor-agg-v003.diffDownload+72-8
On 05.03.21 13:42, Alexey Bashtanov wrote:
Thanks for your reviews.
I've updated my patch to the current master and added a documentation
line suggesting using the new function as a checksum.
committed
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.
If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 3/6/21 8:55 PM, David Fetter wrote:
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */
Why would such an error be necessary, or even desirable?
--
Vik Fearing
On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:
On 3/6/21 8:55 PM, David Fetter wrote:
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 3/6/21 9:00 PM, David Fetter wrote:
On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:
On 3/6/21 8:55 PM, David Fetter wrote:
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.
I think one of us is forgetting how XOR works.
--
Vik Fearing
On Sat, Mar 06, 2021 at 09:03:25PM +0100, Vik Fearing wrote:
On 3/6/21 9:00 PM, David Fetter wrote:
On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:
On 3/6/21 8:55 PM, David Fetter wrote:
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
On 10.02.21 06:42, Kyotaro Horiguchi wrote:
We already had CREATE AGGREATE at the time, so BIT_XOR can be
thought as it falls into the same category with BIT_AND and
BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?I think the use of BIT_XOR is quite separate from BIT_AND and
BIT_OR. The latter give you an "all" or "any" result of the bits
set. BIT_XOR will return 1 or true if an odd number of inputs are 1
or true, which isn't useful by itself. But it can be used as a
checksum, so it seems pretty reasonable to me to add it. Perhaps
the use case could be pointed out in the documentation.If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.I think one of us is forgetting how XOR works.
Oops. You're right.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.
Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we want
to do (and also maybe make doing so the rule going forward)?
David J.
On 3/6/21 9:06 PM, David G. Johnston wrote:
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something we want
to do (and also maybe make doing so the rule going forward)?
Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.
--
Vik Fearing
ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 3/6/21 9:06 PM, David G. Johnston wrote:
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something wewant
to do (and also maybe make doing so the rule going forward)?
Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.
I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some stupid
things.
It is important at this time, because complexity in IT is pretty high, and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about costs
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it has
sense.
Regards
Pavel
--
Show quoted text
Vik Fearing
On 3/7/21 10:53 AM, Pavel Stehule wrote:
ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 3/6/21 9:06 PM, David G. Johnston wrote:
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something wewant
to do (and also maybe make doing so the rule going forward)?
Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does some stupid
things.It is important at this time, because complexity in IT is pretty high, and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is about costs
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when it has
sense.
How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?
If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.
--
Vik Fearing
ne 7. 3. 2021 v 11:02 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 3/7/21 10:53 AM, Pavel Stehule wrote:
ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 3/6/21 9:06 PM, David G. Johnston wrote:
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something wewant
to do (and also maybe make doing so the rule going forward)?
Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does somestupid
things.
It is important at this time, because complexity in IT is pretty high,
and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is aboutcosts
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when ithas
sense.
How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.
The safeguard is mandatory ORDER BY clause.
--
Show quoted text
Vik Fearing
On 3/7/21 11:05 AM, Pavel Stehule wrote:
ne 7. 3. 2021 v 11:02 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 3/7/21 10:53 AM, Pavel Stehule wrote:
ne 7. 3. 2021 v 10:36 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 3/6/21 9:06 PM, David G. Johnston wrote:
On Saturday, March 6, 2021, David Fetter <david@fetter.org> wrote:
SELECT BIT_XOR(b ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM... /* errors out */Why would such an error be necessary, or even desirable?
Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.Numerous existing aggregate functions have this behavior. Making those
error isn’t an option. So is making this a special case something wewant
to do (and also maybe make doing so the rule going forward)?
Aside from the fact that bit_xor() does not need this, I am opposed to
it in general. It is not our job to make people write correct queries.I cannot agree with the last sentence. It is questions about costs and
benefits, but good tool should to make warnings when users does somestupid
things.
It is important at this time, because complexity in IT is pretty high,
and
a lot of users are not well trained (but well trained people can make
errors too). And a lot of users have zero knowledge about technology, So
when it is possible, and when it makes sense, then Postgres should be
simple and safe. I think it is important for renome too. It is aboutcosts
and benefits. Good reputation is a good benefit for us too. Ordered
aggregation was designed for some purposes, and should be used, when ithas
sense.
How many cycles do you recommend we spend on determining whether ORDER
BY a, b is sufficient but ORDER BY a is not?If we had an optimization_effort_level guc (I have often wanted that),
then I agree that this could be added to a very high level. But we
don't, so I don't want any of it.The safeguard is mandatory ORDER BY clause.
And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.
--
Vik Fearing
And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.
The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is one
number and is not possible to do a visual check of correctness (like
median).
--
Show quoted text
Vik Fearing
On 3/7/21 11:24 AM, Pavel Stehule wrote:
And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result is one
number and is not possible to do a visual check of correctness (like
median).
The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().
--
Vik Fearing
ne 7. 3. 2021 v 11:28 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 3/7/21 11:24 AM, Pavel Stehule wrote:
And so you are now mandating an ORDER BY on every query and in every
aggregate and/or window function. Users will not like that at all. I
certainly shan't.The mandatory ORDER BY clause should be necessary for operations when the
result depends on the order. You need an order for calculation of median.
And you don't need to know an order for average. More if the result isone
number and is not possible to do a visual check of correctness (like
median).The syntax for median (percentile_cont(0.5)) already requires an order
by clause. You are now requiring one on array_agg().
array_agg is discuttable, because PostgreSQL arrays are ordered set type.
But very common usage is using arrays instead and unordered sets (because
ANSI/SQL sets) are not supported. But anyway - for arrays I can do visual
check if it is ordered well or not.
--
Show quoted text
Vik Fearing