pg_stat_statements and "IN" conditions

Started by Dmitry Dolgovover 5 years ago155 messages
Jump to latest
#1Dmitry Dolgov
9erthalion6@gmail.com

Hi

I would like to start another thread to follow up on [1]/messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com, mostly to bump up the
topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every different
number of arguments for essentially the same query. Unfortunately a lot of ORMs
like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have only one
record for such a query.

As the result of [1]/messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com I've identified two highlighted approaches to improve this
situation:

* Reduce the generated ArrayExpr to an array Const immediately, in cases where
all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be more
specific. Attached patches could not be considered a completed piece of work,
but they seem to work, mostly pass the tests and demonstrate the point. I would
like to get some high level input about them and ideally make it clear what is
the preferred solution to continue with.

# Reducing ArrayExpr to an array Const

IIUC this requires producing a Const with ArrayType constvalue in
transformAExprIn for ScalarArrayOpExpr. This could be a general improvement,
since apparently it's being done later anyway. But it deals only with Const,
which leaves more on the table, e.g. Params and other similar types of
duplication we observe when repeating constants are wrapped into VALUES.

Another point here is that it's quite possible this approach will still require
corresponding changes in pg_stat_statements, since just preventing duplicates
to show also loses the information. Ideally we also need to have some
understanding how many elements are actually there and display it, e.g. in
cases when there is just one outlier query that contains a huge IN list.

# Contribute nothing to the hash

I guess there could be multiple ways of doing this, but the first idea I had in
mind is to skip jumbling when necessary. At the same time it can be implemented
more centralized for different types of queries (although in the attached patch
there are only Const & Values). In the simplest case we just identify sequence
of constants of the same type, which just ignores any other cases when stuff is
mixed. But I believe it's something that could be considered a rare corner case
and it's better to start with the simplest solution.

Having said that I believe the second approach of contributing nothing to the
hash sounds more appealing, but would love to hear other opinions.

[1]: /messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com

Attachments:

0001-Reduce-ArrayExpr-into-const-array.patchapplication/octet-stream; name=0001-Reduce-ArrayExpr-into-const-array.patchDownload+59-7
0001-Limit-jumbling-for-repeating-constants.patchapplication/octet-stream; name=0001-Limit-jumbling-for-repeating-constants.patchDownload+108-3
#2Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#1)
Re: pg_stat_statements and "IN" conditions

On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:

I would like to start another thread to follow up on [1], mostly to bump up the
topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every different
number of arguments for essentially the same query. Unfortunately a lot of ORMs
like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have only one
record for such a query.

As the result of [1] I've identified two highlighted approaches to improve this
situation:

* Reduce the generated ArrayExpr to an array Const immediately, in cases where
all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be more
specific. Attached patches could not be considered a completed piece of work,
but they seem to work, mostly pass the tests and demonstrate the point. I would
like to get some high level input about them and ideally make it clear what is
the preferred solution to continue with.

I've implemented the second approach mentioned above, this version was
tested on our test clusters for some time without visible issues. Will
create a CF item and would appreciate any feedback.

Attachments:

v1-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+925-13
#3Chengxi Sun
sunchengxi@highgo.com
In reply to: Dmitry Dolgov (#2)
Re: pg_stat_statements and "IN" conditions

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi, I did some test and it works well

#4Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Chengxi Sun (#3)
Re: pg_stat_statements and "IN" conditions

On Wed, Dec 09, 2020 at 03:37:40AM +0000, Chengxi Sun wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi, I did some test and it works well

Thanks for testing!

#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#2)
Re: pg_stat_statements and "IN" conditions

On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:

On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:

I would like to start another thread to follow up on [1], mostly to bump up the
topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every different
number of arguments for essentially the same query. Unfortunately a lot of ORMs
like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have only one
record for such a query.

As the result of [1] I've identified two highlighted approaches to improve this
situation:

* Reduce the generated ArrayExpr to an array Const immediately, in cases where
all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be more
specific. Attached patches could not be considered a completed piece of work,
but they seem to work, mostly pass the tests and demonstrate the point. I would
like to get some high level input about them and ideally make it clear what is
the preferred solution to continue with.

I've implemented the second approach mentioned above, this version was
tested on our test clusters for some time without visible issues. Will
create a CF item and would appreciate any feedback.

After more testing I found couple of things that could be improved,
namely in the presence of non-reducible constants one part of the query
was not copied into the normalized version, and this approach also could
be extended for Params. These are incorporated in the attached patch.

Attachments:

v2-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+1034-15
#6Zhihong Yu
zyu@yugabyte.com
In reply to: Dmitry Dolgov (#5)
Re: pg_stat_statements and "IN" conditions

Hi,
A few comments.

+ "After this number of duplicating constants
start to merge them.",

duplicating -> duplicate

+               foreach(lc, (List *) expr)
+               {
+                   Node * subExpr = (Node *) lfirst(lc);
+
+                   if (!IsA(subExpr, Const))
+                   {
+                       allConst = false;
+                       break;
+                   }
+               }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

+                   if (currentExprIdx == pgss_merge_threshold - 1)
+                   {
+                       JumbleExpr(jstate, expr);
+
+                       /*
+                        * A const expr is already found, so JumbleExpr must
+                        * record it. Mark it as merged, it will be the
first
+                        * merged but still present in the statement query.
+                        */
+                       Assert(jstate->clocations_count > 0);
+                       jstate->clocations[jstate->clocations_count -
1].merged = true;
+                       currentExprIdx++;
+                   }

The above snippet occurs a few times. Maybe extract into a helper method.

Cheers

On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:

On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:

I would like to start another thread to follow up on [1], mostly to

bump up the

topic. Just to remind, it's about how pg_stat_statements jumbling

ArrayExpr in

queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every

different

number of arguments for essentially the same query. Unfortunately a

lot of ORMs

like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have

only one

record for such a query.

As the result of [1] I've identified two highlighted approaches to

improve this

situation:

* Reduce the generated ArrayExpr to an array Const immediately, in

cases where

all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be

more

specific. Attached patches could not be considered a completed piece

of work,

but they seem to work, mostly pass the tests and demonstrate the

point. I would

like to get some high level input about them and ideally make it clear

what is

the preferred solution to continue with.

I've implemented the second approach mentioned above, this version was
tested on our test clusters for some time without visible issues. Will
create a CF item and would appreciate any feedback.

After more testing I found couple of things that could be improved,
namely in the presence of non-reducible constants one part of the query
was not copied into the normalized version, and this approach also could
be extended for Params. These are incorporated in the attached patch.

#7Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Zhihong Yu (#6)
Re: pg_stat_statements and "IN" conditions

On Sat, Dec 26, 2020 at 08:53:28AM -0800, Zhihong Yu wrote:
Hi,
A few comments.

+               foreach(lc, (List *) expr)
+               {
+                   Node * subExpr = (Node *) lfirst(lc);
+
+                   if (!IsA(subExpr, Const))
+                   {
+                       allConst = false;
+                       break;
+                   }
+               }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

Thanks for noticing. Now that I look at it closer I think it's the other
way around, the loop above checking constants for the first expression
is not really necessary.

+                   if (currentExprIdx == pgss_merge_threshold - 1)
+                   {
+                       JumbleExpr(jstate, expr);
+
+                       /*
+                        * A const expr is already found, so JumbleExpr must
+                        * record it. Mark it as merged, it will be the
first
+                        * merged but still present in the statement query.
+                        */
+                       Assert(jstate->clocations_count > 0);
+                       jstate->clocations[jstate->clocations_count -
1].merged = true;
+                       currentExprIdx++;
+                   }

The above snippet occurs a few times. Maybe extract into a helper method.

Originally I was hesitant to extract it was because it's quite small
part of the code. But now I've realized that the part relevant to lists
is not really correct, which makes those bits even more different, so I
think it makes sense to leave it like that. What do you think?

Attachments:

v3-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+1137-15
#8Zhihong Yu
zyu@yugabyte.com
In reply to: Dmitry Dolgov (#7)
Re: pg_stat_statements and "IN" conditions

Hi, Dmitry:

+ int lastExprLenght = 0;

Did you mean to name the variable lastExprLenghth ?

w.r.t. extracting to helper method, the second and third if (currentExprIdx
== pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Cheers

On Tue, Jan 5, 2021 at 4:51 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Sat, Dec 26, 2020 at 08:53:28AM -0800, Zhihong Yu wrote:
Hi,
A few comments.

+               foreach(lc, (List *) expr)
+               {
+                   Node * subExpr = (Node *) lfirst(lc);
+
+                   if (!IsA(subExpr, Const))
+                   {
+                       allConst = false;
+                       break;
+                   }
+               }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

Thanks for noticing. Now that I look at it closer I think it's the other
way around, the loop above checking constants for the first expression
is not really necessary.

+                   if (currentExprIdx == pgss_merge_threshold - 1)
+                   {
+                       JumbleExpr(jstate, expr);
+
+                       /*
+                        * A const expr is already found, so JumbleExpr

must

+                        * record it. Mark it as merged, it will be the
first
+                        * merged but still present in the statement

query.

+                        */
+                       Assert(jstate->clocations_count > 0);
+                       jstate->clocations[jstate->clocations_count -
1].merged = true;
+                       currentExprIdx++;
+                   }

The above snippet occurs a few times. Maybe extract into a helper method.

Originally I was hesitant to extract it was because it's quite small
part of the code. But now I've realized that the part relevant to lists
is not really correct, which makes those bits even more different, so I
think it makes sense to leave it like that. What do you think?

#9David Steele
david@pgmasters.net
In reply to: Zhihong Yu (#8)
Re: pg_stat_statements and "IN" conditions

On 1/5/21 10:51 AM, Zhihong Yu wrote:

+   int         lastExprLenght = 0;

Did you mean to name the variable lastExprLenghth ?

w.r.t. extracting to helper method, the second and third
if (currentExprIdx == pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Dmitry, thoughts on this review?

Regards,
--
-David
david@pgmasters.net

#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: David Steele (#9)
Re: pg_stat_statements and "IN" conditions

On Thu, Mar 18, 2021 at 09:38:09AM -0400, David Steele wrote:
On 1/5/21 10:51 AM, Zhihong Yu wrote:

+ � int � � � � lastExprLenght = 0;

Did you mean to name the variable�lastExprLenghth�?

w.r.t. extracting to helper method, the second and third
if�(currentExprIdx == pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Dmitry, thoughts on this review?

Oh, right. lastExprLenghth is obviously a typo, and as we agreed that
the helper is not strictly necessary I wanted to wait a bit hoping for
more feedback and eventually to post an accumulated patch. Doesn't make
sense to post another version only to fix one typo :)

#11Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#10)
Re: pg_stat_statements and "IN" conditions

On Thu, Mar 18, 2021 at 04:50:02PM +0100, Dmitry Dolgov wrote:

On Thu, Mar 18, 2021 at 09:38:09AM -0400, David Steele wrote:
On 1/5/21 10:51 AM, Zhihong Yu wrote:

+   int         lastExprLenght = 0;

Did you mean to name the variable lastExprLenghth ?

w.r.t. extracting to helper method, the second and third
if (currentExprIdx == pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Dmitry, thoughts on this review?

Oh, right. lastExprLenghth is obviously a typo, and as we agreed that
the helper is not strictly necessary I wanted to wait a bit hoping for
more feedback and eventually to post an accumulated patch. Doesn't make
sense to post another version only to fix one typo :)

Hi,

I've prepared a new rebased version to deal with the new way of
computing query id, but as always there is one tricky part. From what I
understand, now an external module can provide custom implementation for
query id computation algorithm. It seems natural to think this machinery
could be used instead of patch in the thread, i.e. one could create a
custom logic that will enable constants collapsing as needed, so that
same queries with different number of constants in an array will be
hashed into the same record.

But there is a limitation in how such queries will be normalized
afterwards — to reduce level of surprise it's necessary to display the
fact that a certain query in fact had more constants that are showed in
pgss record. Ideally LocationLen needs to carry some bits of information
on what exactly could be skipped, and generate_normalized_query needs to
understand that, both are not reachable for an external module with
custom query id logic (without replicating significant part of the
existing code). Hence, a new version of the patch.

Attachments:

v4-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+1323-16
#12Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#11)
Re: pg_stat_statements and "IN" conditions

On Tue, Jun 15, 2021 at 05:18:50PM +0200, Dmitry Dolgov wrote:

On Thu, Mar 18, 2021 at 04:50:02PM +0100, Dmitry Dolgov wrote:

On Thu, Mar 18, 2021 at 09:38:09AM -0400, David Steele wrote:
On 1/5/21 10:51 AM, Zhihong Yu wrote:

+   int         lastExprLenght = 0;

Did you mean to name the variable lastExprLenghth ?

w.r.t. extracting to helper method, the second and third
if (currentExprIdx == pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Dmitry, thoughts on this review?

Oh, right. lastExprLenghth is obviously a typo, and as we agreed that
the helper is not strictly necessary I wanted to wait a bit hoping for
more feedback and eventually to post an accumulated patch. Doesn't make
sense to post another version only to fix one typo :)

Hi,

I've prepared a new rebased version to deal with the new way of
computing query id, but as always there is one tricky part. From what I
understand, now an external module can provide custom implementation for
query id computation algorithm. It seems natural to think this machinery
could be used instead of patch in the thread, i.e. one could create a
custom logic that will enable constants collapsing as needed, so that
same queries with different number of constants in an array will be
hashed into the same record.

But there is a limitation in how such queries will be normalized
afterwards — to reduce level of surprise it's necessary to display the
fact that a certain query in fact had more constants that are showed in
pgss record. Ideally LocationLen needs to carry some bits of information
on what exactly could be skipped, and generate_normalized_query needs to
understand that, both are not reachable for an external module with
custom query id logic (without replicating significant part of the
existing code). Hence, a new version of the patch.

Forgot to mention a couple of people who already reviewed the patch.

Attachments:

v4-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+1323-16
#13Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#12)
Re: pg_stat_statements and "IN" conditions

On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:

I've prepared a new rebased version to deal with the new way of
computing query id, but as always there is one tricky part. From what I
understand, now an external module can provide custom implementation for
query id computation algorithm. It seems natural to think this machinery
could be used instead of patch in the thread, i.e. one could create a
custom logic that will enable constants collapsing as needed, so that
same queries with different number of constants in an array will be
hashed into the same record.

But there is a limitation in how such queries will be normalized
afterwards — to reduce level of surprise it's necessary to display the
fact that a certain query in fact had more constants that are showed in
pgss record. Ideally LocationLen needs to carry some bits of information
on what exactly could be skipped, and generate_normalized_query needs to
understand that, both are not reachable for an external module with
custom query id logic (without replicating significant part of the
existing code). Hence, a new version of the patch.

Forgot to mention a couple of people who already reviewed the patch.

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).

Attachments:

v5-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patchtext/x-diff; charset=us-asciiDownload+1353-16
#14Zhihong Yu
zyu@yugabyte.com
In reply to: Dmitry Dolgov (#13)
Re: pg_stat_statements and "IN" conditions

On Thu, Sep 30, 2021 at 6:49 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:

I've prepared a new rebased version to deal with the new way of
computing query id, but as always there is one tricky part. From what I
understand, now an external module can provide custom implementation

for

query id computation algorithm. It seems natural to think this

machinery

could be used instead of patch in the thread, i.e. one could create a
custom logic that will enable constants collapsing as needed, so that
same queries with different number of constants in an array will be
hashed into the same record.

But there is a limitation in how such queries will be normalized
afterwards — to reduce level of surprise it's necessary to display the
fact that a certain query in fact had more constants that are showed in
pgss record. Ideally LocationLen needs to carry some bits of

information

on what exactly could be skipped, and generate_normalized_query needs

to

understand that, both are not reachable for an external module with
custom query id logic (without replicating significant part of the
existing code). Hence, a new version of the patch.

Forgot to mention a couple of people who already reviewed the patch.

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).

Hi,

bq. and at position further that specified threshold.

that specified threshold -> than specified threshold

Cheers

#15Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Zhihong Yu (#14)
Re: pg_stat_statements and "IN" conditions

On Thu, Sep 30, 2021 at 08:03:16AM -0700, Zhihong Yu wrote:
On Thu, Sep 30, 2021 at 6:49 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:

I've prepared a new rebased version to deal with the new way of
computing query id, but as always there is one tricky part. From what I
understand, now an external module can provide custom implementation

for

query id computation algorithm. It seems natural to think this

machinery

could be used instead of patch in the thread, i.e. one could create a
custom logic that will enable constants collapsing as needed, so that
same queries with different number of constants in an array will be
hashed into the same record.

But there is a limitation in how such queries will be normalized
afterwards — to reduce level of surprise it's necessary to display the
fact that a certain query in fact had more constants that are showed in
pgss record. Ideally LocationLen needs to carry some bits of

information

on what exactly could be skipped, and generate_normalized_query needs

to

understand that, both are not reachable for an external module with
custom query id logic (without replicating significant part of the
existing code). Hence, a new version of the patch.

Forgot to mention a couple of people who already reviewed the patch.

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).

Hi,

bq. and at position further that specified threshold.

that specified threshold -> than specified threshold

You mean in the patch commit message, nowhere else, right? Yep, my spell
checker didn't catch that, thanks for noticing!

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#15)
Re: pg_stat_statements and "IN" conditions

Dmitry Dolgov <9erthalion6@gmail.com> writes:

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).

I've been saying from day one that pushing the query-hashing code into the
core was a bad idea, and I think this patch perfectly illustrates why.
We can debate whether the rules proposed here are good for
pg_stat_statements or not, but it seems inevitable that they will be a
disaster for some other consumers of the query hash. In particular,
dropping external parameters from the hash seems certain to break
something for somebody --- do you really think that a query with two int
parameters is equivalent to one with five float parameters for all
query-identifying purposes?

I can see the merits of allowing different numbers of IN elements
to be considered equivalent for pg_stat_statements, but this patch
seems to go far beyond that basic idea, and I fear the side-effects
will be very bad.

Also, calling eval_const_expressions in the query jumbler is flat
out unacceptable. There is way too much code that could be reached
that way (more or less the entire executor, to start with). I
don't have a lot of faith that it'd never modify the input tree,
either.

regards, tom lane

#17Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#16)
Re: pg_stat_statements and "IN" conditions

On 1/5/22 4:02 AM, Tom Lane wrote:

Dmitry Dolgov <9erthalion6@gmail.com> writes:

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).

I've been saying from day one that pushing the query-hashing code into the
core was a bad idea, and I think this patch perfectly illustrates why.
We can debate whether the rules proposed here are good for
pg_stat_statements or not, but it seems inevitable that they will be a
disaster for some other consumers of the query hash. In particular,
dropping external parameters from the hash seems certain to break
something for somebody

+1.

In a couple of extensions I use different logic of query jumbling - hash
value is more stable in some cases than in default implementation. For
example, it should be stable to permutations in 'FROM' section of a query.
And If anyone subtly changes jumbling logic when the extension is
active, the instance could get huge performance issues.

Let me suggest, that the core should allow an extension at least to
detect such interference between extensions. Maybe hook could be
replaced with callback to allow extension see an queryid with underlying
generation logic what it expects.

--
regards,
Andrey Lepikhov
Postgres Professional

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#17)
Re: pg_stat_statements and "IN" conditions

"Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru> writes:

On 1/5/22 4:02 AM, Tom Lane wrote:

I've been saying from day one that pushing the query-hashing code into the
core was a bad idea, and I think this patch perfectly illustrates why.

+1.

Let me suggest, that the core should allow an extension at least to
detect such interference between extensions. Maybe hook could be
replaced with callback to allow extension see an queryid with underlying
generation logic what it expects.

I feel like we need to get away from the idea that there is just
one query hash, and somehow let different extensions attach
differently-calculated hashes to a query. I don't have any immediate
ideas about how to do that in a reasonably inexpensive way.

regards, tom lane

#19Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#16)
Re: pg_stat_statements and "IN" conditions

On Tue, Jan 04, 2022 at 06:02:43PM -0500, Tom Lane wrote:
We can debate whether the rules proposed here are good for
pg_stat_statements or not, but it seems inevitable that they will be a
disaster for some other consumers of the query hash.

Hm, which consumers do you mean here, potential extension? Isn't the
ability to use an external module to compute queryid make this situation
possible anyway?

do you really think that a query with two int
parameters is equivalent to one with five float parameters for all
query-identifying purposes?

Nope, and it will be hard to figure this out no matter which approach
we're talking about, because it mostly depends on the context and type
of queries I guess. Instead, such functionality should allow some
reasonable configuration. To be clear, the use case I have in mind here
is not four or five, but rather a couple of hundreds constants where
chances that the whole construction was generated automatically by ORM
is higher than normal.

I can see the merits of allowing different numbers of IN elements
to be considered equivalent for pg_stat_statements, but this patch
seems to go far beyond that basic idea, and I fear the side-effects
will be very bad.

Not sure why it goes far beyond, but then there were two approaches
under consideration, as I've stated in the first message. I already
don't remember all the details, but another one was evolving around
doing similar things in a more limited fashion in transformAExprIn. The
problem would be then to carry the information, necessary to represent
the act of "merging" some number of queryids together. Any thoughts
here?

The idea of keeping the original queryid untouched and add another type
of id instead sounds interesting, but it will add too much overhead for
a quite small use case I guess.

#20Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#19)
Re: pg_stat_statements and "IN" conditions

On Wed, Jan 05, 2022 at 10:11:11PM +0100, Dmitry Dolgov wrote:

On Tue, Jan 04, 2022 at 06:02:43PM -0500, Tom Lane wrote:
We can debate whether the rules proposed here are good for
pg_stat_statements or not, but it seems inevitable that they will be a
disaster for some other consumers of the query hash.

Hm, which consumers do you mean here, potential extension? Isn't the
ability to use an external module to compute queryid make this situation
possible anyway?

do you really think that a query with two int
parameters is equivalent to one with five float parameters for all
query-identifying purposes?

Nope, and it will be hard to figure this out no matter which approach
we're talking about, because it mostly depends on the context and type
of queries I guess. Instead, such functionality should allow some
reasonable configuration. To be clear, the use case I have in mind here
is not four or five, but rather a couple of hundreds constants where
chances that the whole construction was generated automatically by ORM
is higher than normal.

I can see the merits of allowing different numbers of IN elements
to be considered equivalent for pg_stat_statements, but this patch
seems to go far beyond that basic idea, and I fear the side-effects
will be very bad.

Not sure why it goes far beyond, but then there were two approaches
under consideration, as I've stated in the first message. I already
don't remember all the details, but another one was evolving around
doing similar things in a more limited fashion in transformAExprIn. The
problem would be then to carry the information, necessary to represent
the act of "merging" some number of queryids together. Any thoughts
here?

The idea of keeping the original queryid untouched and add another type
of id instead sounds interesting, but it will add too much overhead for
a quite small use case I guess.

```
Thu, 10 Mar 2022
New status: Waiting on Author
```

This seems incorrect, as the only feedback I've got was "this is a bad
idea", and no reaction on follow-up questions.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#23Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#22)
#24Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#21)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Dmitry Dolgov (#24)
#26Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Dmitry Dolgov (#26)
#28Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#27)
#30Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#30)
#32Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#31)
#33Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#32)
#34Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#33)
In reply to: Dmitry Dolgov (#34)
#36Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sergei Kornilov (#35)
#37Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#36)
#38vignesh C
vignesh21@gmail.com
In reply to: Dmitry Dolgov (#37)
#39Dmitry Dolgov
9erthalion6@gmail.com
In reply to: vignesh C (#38)
#40Marcos Pegoraro
marcos@f10.com.br
In reply to: Dmitry Dolgov (#39)
#41Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Marcos Pegoraro (#40)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#41)
#43Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#42)
#44Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#43)
#45Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#44)
#46Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#45)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#46)
#48Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#47)
In reply to: Dmitry Dolgov (#48)
#50Peter Eisentraut
peter_e@gmx.net
In reply to: Sergei Kornilov (#49)
#51Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sergei Kornilov (#49)
#52Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Peter Eisentraut (#50)
#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#52)
#54Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#53)
#55David Geier
geidav.pg@gmail.com
In reply to: Dmitry Dolgov (#51)
#56Dmitry Dolgov
9erthalion6@gmail.com
In reply to: David Geier (#55)
#57Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#56)
#58David Geier
geidav.pg@gmail.com
In reply to: Dmitry Dolgov (#57)
#59Dmitry Dolgov
9erthalion6@gmail.com
In reply to: David Geier (#58)
#60Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#54)
#61David Geier
geidav.pg@gmail.com
In reply to: Dmitry Dolgov (#59)
#62Dmitry Dolgov
9erthalion6@gmail.com
In reply to: David Geier (#61)
#63Gregory Stark (as CFM)
stark.cfm@gmail.com
In reply to: Dmitry Dolgov (#62)
#64Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Gregory Stark (as CFM) (#63)
#65Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#64)
#66Nathan Bossart
nathandbossart@gmail.com
In reply to: Dmitry Dolgov (#65)
#67Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Nathan Bossart (#66)
#68Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Dmitry Dolgov (#67)
#69Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Jakub Wartak (#68)
#70Yasuo Honda
yasuo.honda@gmail.com
In reply to: Maciek Sakrejda (#69)
#71Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#67)
#72Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#71)
#73Nathan Bossart
nathandbossart@gmail.com
In reply to: Dmitry Dolgov (#67)
#74Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#72)
#75Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#74)
#76Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#75)
#77Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#76)
#78vignesh C
vignesh21@gmail.com
In reply to: Dmitry Dolgov (#77)
#79Dmitry Dolgov
9erthalion6@gmail.com
In reply to: vignesh C (#78)
#80Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#79)
#81Peter Smith
smithpb2250@gmail.com
In reply to: Dmitry Dolgov (#80)
#82Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Peter Smith (#81)
#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Dolgov (#82)
#84Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#83)
#85Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#84)
#86Yasuo Honda
yasuo.honda@gmail.com
In reply to: Dmitry Dolgov (#85)
#87Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Yasuo Honda (#86)
#88Yasuo Honda
yasuo.honda@gmail.com
In reply to: Dmitry Dolgov (#87)
#89Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Yasuo Honda (#88)
#90Yasuo Honda
yasuo.honda@gmail.com
In reply to: Dmitry Dolgov (#89)
#91Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Yasuo Honda (#90)
#92Yasuo Honda
yasuo.honda@gmail.com
In reply to: Dmitry Dolgov (#91)
#93Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Yasuo Honda (#92)
#94Sutou Kouhei
kou@clear-code.com
In reply to: Dmitry Dolgov (#93)
#95Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sutou Kouhei (#94)
#96Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#95)
In reply to: Dmitry Dolgov (#96)
#98Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sergei Kornilov (#97)
#99Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#98)
#100Kirill Reshke
reshkekirill@gmail.com
In reply to: Dmitry Dolgov (#99)
#101Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Kirill Reshke (#100)
#102Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#101)
#103Sami Imseih
samimseih@gmail.com
In reply to: Alvaro Herrera (#102)
#104Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#103)
#105Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Sami Imseih (#104)
#106Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#105)
#107Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#106)
#108Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Sami Imseih (#103)
#109Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#107)
#110Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#109)
#111Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#110)
#112Sami Imseih
samimseih@gmail.com
In reply to: Alvaro Herrera (#111)
#113Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#111)
#114Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#113)
#115Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#114)
#116Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#115)
#117Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#116)
#118Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#117)
#119Julien Rouhaud
rjuju123@gmail.com
In reply to: Dmitry Dolgov (#118)
#120Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Julien Rouhaud (#119)
#121Julien Rouhaud
rjuju123@gmail.com
In reply to: Alvaro Herrera (#120)
#122Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Julien Rouhaud (#121)
#123Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#122)
#124Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#123)
#125Julien Rouhaud
rjuju123@gmail.com
In reply to: Dmitry Dolgov (#124)
#126Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Julien Rouhaud (#125)
#127Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#124)
#128Julien Rouhaud
rjuju123@gmail.com
In reply to: Dmitry Dolgov (#126)
#129Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Julien Rouhaud (#128)
#130Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#127)
#131Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#130)
#132Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#131)
#133Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#130)
#134Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#133)
#135Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#134)
#136Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#135)
#137Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#136)
#138Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#137)
#139Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#138)
#140Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Sami Imseih (#138)
#141Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#140)
#142Sami Imseih
samimseih@gmail.com
In reply to: Alvaro Herrera (#140)
#143vignesh C
vignesh21@gmail.com
In reply to: Alvaro Herrera (#140)
#144Dmitry Dolgov
9erthalion6@gmail.com
In reply to: vignesh C (#143)
#145vignesh C
vignesh21@gmail.com
In reply to: Dmitry Dolgov (#144)
#146Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#144)
#147Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#129)
#148Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#147)
#149Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#148)
#150Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#149)
#151Sami Imseih
samimseih@gmail.com
In reply to: Alvaro Herrera (#150)
#152Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Sami Imseih (#151)
#153Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#150)
#154Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#151)
#155Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#154)