queryId constant squashing does not support prepared statements
62d712ec added the ability to squash constants from an IN LIST
for queryId computation purposes. This means that a similar
queryId will be generated for the same queries that only
different on the number of values in the IN-LIST.
The patch lacks the ability to apply this optimization to values
passed in as parameters ( i.e. parameter kind = PARAM_EXTERN )
which will be the case for SQL prepared statements and protocol level
prepared statements, i.e.
```
select from t where id in (1, 2, 3) \bind
```
or
```
prepare prp(int, int, int) as select from t where id in ($1, $2, $3);
```
Here is the current state,
```
postgres=# create table t (id int);
CREATE TABLE
postgres=# prepare prp(int, int, int) as select from t where id in ($1, $2, $3);
PREPARE
postgres=# execute prp(1, 2, 3);
postgres=# select from t where id in (1, 2, 3);
--
(0 rows)
postgres=# SELECT query, calls FROM pg_stat_statements ORDER BY query
COLLATE "C";
query
| calls
-----------------------------------------------------------------------------------------------------------+-------
...
....
select from t where id in ($1 /*, ... */)
| 1
select from t where id in ($1, $2, $3)
| 1 <<- prepared statement
(6 rows)
```
but with the attached patch, the optimization applies.
```
create table t (id int)
| 1
select from t where id in ($1 /*, ... */)
| 2
(3 rows)
```
I think this is a pretty big gap as many of the common drivers such as JDBC,
which use extended query protocol, will not be able to take advantage of
the optimization in 18, which will be very disappointing.
Thoughts?
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v1-0001-Allow-query-jumble-to-squash-a-list-external-para.patchapplication/octet-stream; name=v1-0001-Allow-query-jumble-to-squash-a-list-external-para.patchDownload
From f4715162978951eb4513b6963b5cc7cd24d5a5d9 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Wed, 30 Apr 2025 15:46:58 -0500
Subject: [PATCH v1 1/1] Allow query jumble to squash a list external
parameters
62d712ecf now allows query jumbling to squash a list of constants,
but not constants that are passed as external parameters. This patch
now allows the squashing of constant values supplied as external parameters
(e.g., $1, $2), as is the case with prepared statements.
---
.../pg_stat_statements/expected/squashing.out | 33 +++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 11 +++++++
src/backend/nodes/queryjumblefuncs.c | 20 ++++++++---
3 files changed, 60 insertions(+), 4 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..8dc98bad6d5 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -301,6 +301,39 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Test bind parameters
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3) \bind 1 2 3
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3, $4) \bind 1 2 3 4
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_bigint WHERE data IN
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint) \bind 1 2 3 4
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 3
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- CoerceViaIO
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 908be81ff2b..ce0bcbc4121 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -97,6 +97,17 @@ SELECT * FROM test_squash_jsonb WHERE data IN
(SELECT '"10"')::jsonb);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Test bind parameters
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3) \bind 1 2 3
+;
+SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3, $4) \bind 1 2 3 4
+;
+SELECT * FROM test_squash_bigint WHERE data IN
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint) \bind 1 2 3 4
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- CoerceViaIO
-- Create some dummy type to force CoerceViaIO
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..7468583edc8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -410,7 +410,8 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Ignore a possible wrapping RelabelType and CoerceViaIO.
* - If it's a FuncExpr, check that the function is an implicit
* cast and its arguments are Const.
- * - Otherwise test if the expression is a simple Const.
+ * - Otherwise test if the expression is a simple Const or an
+ * external parameter.
*/
static bool
IsSquashableConst(Node *element)
@@ -444,10 +445,21 @@ IsSquashableConst(Node *element)
return true;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
/*
--
2.39.5 (Apple Git-154)
On Wed, Apr 30, 2025 at 04:52:06PM -0500, Sami Imseih wrote:
62d712ec added the ability to squash constants from an IN LIST
for queryId computation purposes. This means that a similar
queryId will be generated for the same queries that only
different on the number of values in the IN-LIST.The patch lacks the ability to apply this optimization to values
passed in as parameters ( i.e. parameter kind = PARAM_EXTERN )
which will be the case for SQL prepared statements and protocol level
prepared statements, i.e.I think this is a pretty big gap as many of the common drivers such as JDBC,
which use extended query protocol, will not be able to take advantage of
the optimization in 18, which will be very disappointing.Thoughts?
Yes. Long IN/ANY clauses are as far as a more common pattern caused
by ORMs, and I'd like to think that application developers would not
hardcode such clauses in their right minds (well, okay, I'm likely
wrong about this assumption, feel free to counter-argue). These also
like relying on the extended query protocol. Not taking into account
JDBC in that is a bummer, and it is very popular.
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.
Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com
--
Michael
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com
In fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.
At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..
Not entirely sure how that would work exactly, but after my experiments
with the squashing patch I found it could be very useful to be able to
identify the end location of an expression list in the parser.
I spent a few hours looking into this today and to your points below:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
I don't think limiting this feature to Const only will suffice.
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..Not entirely sure how that would work exactly, but after my experiments
with the squashing patch I found it could be very useful to be able to
identify the end location of an expression list in the parser.
I also came to the same conclusion, that we should track the start '('
and end ')'
location of a expression list to allow us to hide the fields. But, I
will look into
other approaches as well.
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.
--
Sami
On Thu, May 01, 2025 at 03:57:16PM -0500, Sami Imseih wrote:
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
Likely so. I don't have anything else than Sublink in mind that would
be worth a special case..
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.
I am wondering if this would not be the best move to do on HEAD.
Let's see where the discussion drives us.
--
Michael
On Fri, May 02, 2025 at 07:10:19AM GMT, Michael Paquier wrote:
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.I am wondering if this would not be the best move to do on HEAD.
Let's see where the discussion drives us.
Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.
On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote:
Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
--
Michael
On Fri, May 02, 2025 at 04:18:37PM GMT, Michael Paquier wrote:
On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote:Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
If I understand you correctly, changing the way how element list is
identified is not going to address the question whether or not to squash
parameters, right?
On 2025-May-02, Michael Paquier wrote:
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
I disagree that a revert is warranted for this reason. If you want to
change the implementation later, that's fine, as long as the user
interface doesn't change.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
Returning to the topic of whether to squash list of Params.
Originally squashing of Params wasn't included into the squashing patch due to
concerns from reviewers about treating quite different queries as the same for
the purposes of query identification. E.g. there is some assumption somewhere,
which will be broken if we treat query with a list of integer parameters same
as a query with a list of float parameters. For the sake of making progress
I've decided to postpone answering this question and concentrate on more simple
scenario. Now, as the patch was applied, I think it's a good moment to reflect
on those concerns. It's not enough to say that we don't see any problems with
squashing of Param, some more sound argumentation is needed. So, what will
happen if parameters are squashed as constants?
1. One obvious impact is that more queries, that were considered distinct
before, will have the same normalized query and hence the entry in
pg_stat_statements. Since a Param could be pretty much anything, this can lead
to a situation when two queries with quiet different performance profiles (e.g.
one contains a parameter value, which is a heavy function, another one doesn't)
are matched to one entry, making it less useful.
But at the same time this already can happen if those two queries have the same
number of parameters, since query parametrizing is intrinsically lossy in this
sense. The only thing we do by squashing such queries is we loose information
about the number of parameters, not properties of the parameters themselves.
2. Another tricky scenario is when queryId is used by some extension, which in
turn makes assumption about it that are going to be rendered incorrect by
squashing. The only type of assumptions I can imagine falling into this
category is anything about equivalence of queries. For example, an extension
can capture two queries, which have the same normalized entry in pgss, and
assume all properties of those queries are the same.
It's worth noting that normalized query is not transitive, i.e. if a query1 has
the normalized version query_norm, and a query2 has the same normalized version
query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
they could have list of parameter values with different type and the same
size). That means that such assumptions are already faulty, and could work most
of the time only because it takes queries with a list of the same size to break
the assumption. Squashing such queries will make them wrong more often.
One can argue that we might want to be friendly to such extensions, and do not
"break" them even further. But I don't think it's worth it, as number of such
extensions is most likely low, if any. One more extreme case would be when an
extension assumes that queries with the same entry in pgss have the same number
of parameters, but I don't see how such assumption could be useful.
3. More annoying is the consequence that parameters are going to be treated as
constants in pg_stat_statements. While mostly harmless, that would mean they're
going to be replaced in the same way as constants. This means that the
parameter order is going to be lost, e.g.:
SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
AND id IN ($2 /*, ... */)
This representation could be confusing of course. It could be either explained
in the documentation, or LocationLen has to be extended to carry information
about whether it's a constant or a parameter, and do not replace the latter. In
any case, anything more than the first parameter number will be lost, but it's
probably not so dramatic.
At the end of the day, I think the value of squashing for parameters outweighs
the problems described above. As long as there is an agreement about that, it's
fine by me. I've attached the more complete version of the patch (but without
modifying LocationLen to not replace Param yet) in case if such agreemeng will
be achieved.
Attachments:
v2-0001-Allow-query-jumble-to-squash-a-list-of-external-p.patchtext/plain; charset=us-asciiDownload
From 60739b6a458115fa571777281bacbfc057da0589 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Wed, 30 Apr 2025 15:46:58 -0500
Subject: [PATCH v2] Allow query jumble to squash a list of external parameters
62d712ecf allows query jumbling to squash a list of constants, but not
external parameters. The latter is important in practice, as such
queries are often generated by ORMs. Allow to squash external parameters
as well.
---
.../pg_stat_statements/expected/squashing.out | 92 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 40 ++++----
contrib/pg_stat_statements/sql/squashing.sql | 39 ++++++++
doc/src/sgml/pgstatstatements.sgml | 4 +-
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 50 ++++++----
src/include/nodes/queryjumble.h | 7 +-
7 files changed, 190 insertions(+), 44 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..477dbb8bf02 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -301,6 +301,98 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Test bind parameters
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)
+ \bind 1 2 3 4 5 6 7 8 9
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+ \bind 1 2 3 4 5 6 7 8 9 10
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_bigint WHERE data IN
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint, $5::bigint, $6::bigint,
+ $7::bigint, $8::bigint, $9::bigint, $10::bigint) \bind 1 2 3 4 5 6 7 8 9 10
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint +| 4
+ WHERE data IN ($1 /*, ... */) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Test parameters order
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+ AND id IN ($11, $12, $13, $14, $15, $16, $17, $18, $19, $20)
+ \bind 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
+;
+ id | data
+----+------
+(0 rows)
+
+-- No new pgss entry
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($10, $9, $8, $7, $6, $5, $4, $3, $2, $1)
+ \bind 1 2 3 4 5 6 7 8 9 10
+;
+ id | data
+----+------
+(0 rows)
+
+-- Test combination with constants, no new pgss entry
+SELECT * FROM test_squash_bigint
+ WHERE data IN (1, 2, 3, 4, 5, $1, $2, $3, $4, $5)
+ \bind 1 2 3 4 5
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint +| 2
+ WHERE data IN ($1 /*, ... */) |
+ SELECT * FROM test_squash_bigint +| 1
+ WHERE data IN ($1 /*, ... */) +|
+ AND id IN ($2 /*, ... */) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
-- CoerceViaIO
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..68bce2b0146 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,9 +2825,11 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
+ bool in_squashed = false; /* in a run of squashed constants
+ * or parameters? */
+ int skipped_expressions = 0; /* Position adjustment of later
+ * constants or parameters after
+ * squashed ones */
/*
@@ -2867,14 +2869,14 @@ generate_normalized_query(JumbleState *jstate, const char *query,
continue; /* ignore any duplicates */
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * What to do next depends on whether we're squashing lists,
+ * and whether we're already in a run of such squashed expressions.
*/
if (!jstate->clocations[i].squashed)
{
/*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
+ * This location corresponds to an expression not to be squashed.
+ * Print what comes before the expression ...
*/
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
@@ -2884,21 +2886,21 @@ generate_normalized_query(JumbleState *jstate, const char *query,
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
- /* ... and then a param symbol replacing the constant itself */
+ /* ... and then a param symbol replacing the expression itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ i + 1 + jstate->highest_extern_param_id - skipped_expressions);
- /* In case previous constants were merged away, stop doing that */
+ /* In case previous expressions were merged away, stop doing that */
in_squashed = false;
}
else if (!in_squashed)
{
/*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
+ * This location is the start position of a run of expressions to
+ * be squashed, so we need to print the representation of starting
+ * a group of stashed expressions.
*
- * Print what comes before the constant ...
+ * Print what comes before the expression ...
*/
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
@@ -2908,25 +2910,25 @@ generate_normalized_query(JumbleState *jstate, const char *query,
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
- /* ... and then start a run of squashed constants */
+ /* ... and then start a run of squashed expressions */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ i + 1 + jstate->highest_extern_param_id - skipped_expressions);
/* The next location will match the block below, to end the run */
in_squashed = true;
- skipped_constants++;
+ skipped_expressions++;
}
else
{
/*
- * The second location of a run of squashable elements; this
+ * The second location of a run of squashable expressions; this
* indicates its end.
*/
in_squashed = false;
}
- /* Otherwise the constant is squashed away -- move forward */
+ /* Otherwise the expression is squashed away -- move forward */
quer_loc = off + tok_len;
last_off = off;
last_tok_len = tok_len;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 908be81ff2b..7d6f920f047 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -97,6 +97,45 @@ SELECT * FROM test_squash_jsonb WHERE data IN
(SELECT '"10"')::jsonb);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Test bind parameters
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)
+ \bind 1 2 3 4 5 6 7 8 9
+;
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+ \bind 1 2 3 4 5 6 7 8 9 10
+;
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
+SELECT * FROM test_squash_bigint WHERE data IN
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint, $5::bigint, $6::bigint,
+ $7::bigint, $8::bigint, $9::bigint, $10::bigint) \bind 1 2 3 4 5 6 7 8 9 10
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Test parameters order
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+ AND id IN ($11, $12, $13, $14, $15, $16, $17, $18, $19, $20)
+ \bind 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
+;
+-- No new pgss entry
+SELECT * FROM test_squash_bigint
+ WHERE data IN ($10, $9, $8, $7, $6, $5, $4, $3, $2, $1)
+ \bind 1 2 3 4 5 6 7 8 9 10
+;
+-- Test combination with constants, no new pgss entry
+SELECT * FROM test_squash_bigint
+ WHERE data IN (1, 2, 3, 4, 5, $1, $2, $3, $4, $5)
+ \bind 1 2 3 4 5
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- CoerceViaIO
-- Create some dummy type to force CoerceViaIO
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 7baa07dcdbf..b9c8a917280 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -633,8 +633,8 @@
single <structname>pg_stat_statements</structname> entry; as explained above,
this is expected to happen for semantically equivalent queries.
In addition, if the only difference between queries is the number of elements
- in a list of constants, the list will get squashed down to a single element but shown
- with a commented-out list indicator:
+ in a list of constants or parameters, the list will get squashed down to a
+ single element but shown with a commented-out list indicator:
<screen>
=# SELECT pg_stat_statements_reset();
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..ddda924a275 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1321,7 +1321,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
elsif (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
and elem $1, @node_types)
{
- # Node type. Squash constants if requested.
+ # Node type. Squash lisf of expressions if requested.
if ($query_jumble_squash)
{
print $jff "\tJUMBLE_ELEMENTS($f);\n"
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..92485e511c3 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -373,12 +373,12 @@ FlushPendingNulls(JumbleState *jstate)
/*
- * Record location of constant within query string of query tree that is
- * currently being walked.
+ * Record location of a constant or a parameter within query string of query
+ * tree that is currently being walked.
*
- * 'squashed' signals that the constant represents the first or the last
- * element in a series of merged constants, and everything but the first/last
- * element contributes nothing to the jumble hash.
+ * 'squashed' signals that the expression represents the first or the last
+ * element in a series of squashed expressions, and everything but the
+ * first/last element contributes nothing to the jumble hash.
*/
static void
RecordConstLocation(JumbleState *jstate, int location, bool squashed)
@@ -405,15 +405,16 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
- * deduce that the expression is a constant:
+ * deduce that the expression is squashable:
*
* - Ignore a possible wrapping RelabelType and CoerceViaIO.
* - If it's a FuncExpr, check that the function is an implicit
* cast and its arguments are Const.
- * - Otherwise test if the expression is a simple Const.
+ * - Otherwise test if the expression is a simple Const or an
+ * external parameter.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashable(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -444,15 +445,26 @@ IsSquashableConst(Node *element)
return true;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
/*
* Subroutine for _jumbleElements: Verify whether the provided list
- * can be squashed, meaning it contains only constant expressions.
+ * can be squashed, meaning it contains only constant expressions or params.
*
* Return value indicates if squashing is possible.
*
@@ -461,7 +473,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableList(List *elements, Node **firstExpr, Node **lastExpr)
{
ListCell *temp;
@@ -474,7 +486,7 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashable(lfirst(temp)))
return false;
}
@@ -517,9 +529,9 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
+ * We jumble lists of constant elements or parameters as one individual item
+ * regardless of how many elements are in the list. This means different
+ * queries jumble to the same query_id, if the only difference is the number of
* elements in the list.
*/
static void
@@ -528,7 +540,7 @@ _jumbleElements(JumbleState *jstate, List *elements)
Node *first,
*last;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableList(elements, &first, &last))
{
/*
* If this list of elements is squashable, keep track of the location
@@ -538,7 +550,7 @@ _jumbleElements(JumbleState *jstate, List *elements)
* list.
*
* For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
+ * FuncExpr, Const, Param) it's fine to use exprLocation of the 'last'
* expression, but if more complex composite expressions are to be
* supported (e.g., OpExpr or FuncExpr as an explicit call), more
* sophisticated tracking will be needed.
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e..3237455150a 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -17,7 +17,8 @@
#include "nodes/parsenodes.h"
/*
- * Struct for tracking locations/lengths of constants during normalization
+ * Struct for tracking locations/lengths of constants and parameters during
+ * normalization
*/
typedef struct LocationLen
{
@@ -26,7 +27,7 @@ typedef struct LocationLen
/*
* Indicates that this location represents the beginning or end of a run
- * of squashed constants.
+ * of squashed expressions.
*/
bool squashed;
} LocationLen;
@@ -43,7 +44,7 @@ typedef struct JumbleState
/* Number of bytes used in jumble[] */
Size jumble_len;
- /* Array of locations of constants that should be removed */
+ /* Array of locations of constants that should be removed and parameters */
LocationLen *clocations;
/* Allocated length of clocations array */
base-commit: b0635bfda0535a7fc36cd11d10eecec4e2a96330
--
2.45.1
On Fri, 2 May 2025 14:56:56 +0200
Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-May-02, Michael Paquier wrote:
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning
the facility from the ground, so the old approach would not be
really relevant..I disagree that a revert is warranted for this reason. If you want to
change the implementation later, that's fine, as long as the user
interface doesn't change.
FWIW, i'm +1 on leaving it in pg18. Prepared statements often look a
little different in other ways, and there are a bunch of other quirks
in how queryid's are calculated too. Didn't there used to be something
with CALL being handled as a utility statement making stored procs look
different from functions?
--
To know the thoughts and deeds that have marked man's progress is to
feel the great heart throbs of humanity through the centuries; and if
one does not feel in these pulsations a heavenward striving, one must
indeed be deaf to the harmonies of life.
Helen Keller. Let Us Have Faith. Doubleday, Doran & Company, 1940.
Michael Paquier <michael@paquier.xyz> writes:
On Thu, May 01, 2025 at 03:57:16PM -0500, Sami Imseih wrote:
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
Likely so. I don't have anything else than Sublink in mind that would
be worth a special case..
I think this is completely wrong. As simple examples, there is
nothing even a little bit comparable between the behaviors of
t1.x IN (1, 2, 3)
t1.x IN (1, 2, t2.y)
t1.x IN (1, 2, random())
Squashing these to look the same would be doing nobody any favors.
I do agree that treating PARAM_EXTERN Params the same as constants
for this purpose is a reasonable thing to do, on three arguments:
1. A PARAM_EXTERN Param actually behaves largely the same as a Const
so far as a query is concerned: it does not change value across
the execution of the query. (This is not true of other kinds of
Params.)
2. It's very much dependent on the client-side stack whether a given
value that is constant in the mind of the application will be passed
to the backend as a Const or a Param. (This is okay because #1.)
3. Even if the value is passed as a Param, the planner might replace
it by a Const by means of generating a custom query plan.
So the boundary between PARAM_EXTERN Params and Consts is actually
mighty squishy, and thus I think it makes sense for pg_stat_statements
to mash them together. But this logic does not extend to Vars or
function calls or much of anything else.
Maybe in the future we could have a discussion about whether
expressions involving only Params, Consts, and immutable functions
(say, "$1 + 1") could be mashed as though they were constants, on
the grounds that they'd have been reduced to a single constant if the
planner had chosen to generate a custom plan. But I think it's too
late to consider that for v18. I'd be okay with the rule "treat any
list of Consts and PARAM_EXTERN Params the same as any other" for v18.
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.
regards, tom lane
Hi Dmitry,
On Sun, May 4, 2025 at 6:19 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.Returning to the topic of whether to squash list of Params.
Originally squashing of Params wasn't included into the squashing patch due to
concerns from reviewers about treating quite different queries as the same for
the purposes of query identification. E.g. there is some assumption somewhere,
which will be broken if we treat query with a list of integer parameters same
as a query with a list of float parameters. For the sake of making progress
I've decided to postpone answering this question and concentrate on more simple
scenario. Now, as the patch was applied, I think it's a good moment to reflect
on those concerns. It's not enough to say that we don't see any problems with
squashing of Param, some more sound argumentation is needed. So, what will
happen if parameters are squashed as constants?1. One obvious impact is that more queries, that were considered distinct
before, will have the same normalized query and hence the entry in
pg_stat_statements. Since a Param could be pretty much anything, this can lead
to a situation when two queries with quiet different performance profiles (e.g.
one contains a parameter value, which is a heavy function, another one doesn't)
are matched to one entry, making it less useful.But at the same time this already can happen if those two queries have the same
number of parameters, since query parametrizing is intrinsically lossy in this
sense. The only thing we do by squashing such queries is we loose information
about the number of parameters, not properties of the parameters themselves.2. Another tricky scenario is when queryId is used by some extension, which in
turn makes assumption about it that are going to be rendered incorrect by
squashing. The only type of assumptions I can imagine falling into this
category is anything about equivalence of queries. For example, an extension
can capture two queries, which have the same normalized entry in pgss, and
assume all properties of those queries are the same.It's worth noting that normalized query is not transitive, i.e. if a query1 has
the normalized version query_norm, and a query2 has the same normalized version
query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
they could have list of parameter values with different type and the same
size). That means that such assumptions are already faulty, and could work most
of the time only because it takes queries with a list of the same size to break
the assumption. Squashing such queries will make them wrong more often.One can argue that we might want to be friendly to such extensions, and do not
"break" them even further. But I don't think it's worth it, as number of such
extensions is most likely low, if any. One more extreme case would be when an
extension assumes that queries with the same entry in pgss have the same number
of parameters, but I don't see how such assumption could be useful.3. More annoying is the consequence that parameters are going to be treated as
constants in pg_stat_statements. While mostly harmless, that would mean they're
going to be replaced in the same way as constants. This means that the
parameter order is going to be lost, e.g.:SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
AND id IN ($2 /*, ... */)This representation could be confusing of course. It could be either explained
in the documentation, or LocationLen has to be extended to carry information
about whether it's a constant or a parameter, and do not replace the latter. In
any case, anything more than the first parameter number will be lost, but it's
probably not so dramatic.At the end of the day, I think the value of squashing for parameters outweighs
the problems described above. As long as there is an agreement about that, it's
fine by me. I've attached the more complete version of the patch (but without
modifying LocationLen to not replace Param yet) in case if such agreemeng will
be achieved.
Would it make sense to rename `RecordConstLocation` to something like
`RecordExpressionLocation` instead?
- /* Array of locations of constants that should be removed */
+ /* Array of locations of constants that should be removed and parameters */
LocationLen *clocations;
should be
+ /* Array of locations of constants and parameters that should be removed */
You could also consider renaming `clocations` to `elocations`, this
may introduce
some additional churn though.
--
Regards
Junwang Zhao
On Tue, May 06, 2025 at 11:50:07PM GMT, Junwang Zhao wrote:
Would it make sense to rename `RecordConstLocation` to something like
`RecordExpressionLocation` instead?
Yeah, naming is hard. RecordExpressionLocation is somehow more vague,
but I see what you mean, maybe something along these lines would be
indeed a better fit.
- /* Array of locations of constants that should be removed */ + /* Array of locations of constants that should be removed and parameters */ LocationLen *clocations;should be
+ /* Array of locations of constants and parameters that should be removed */
That was clumsy but intentional, because contrary to constants
parameters do not need to be removed. I guess I have to change the
wording a bit to make it clear.
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.
As I was looking further into what we currently have in v18 and HEAD
the normalization could break if we pass a function.
For example,
"""
select where 1 in (1, 2, int4(1));
"""
the normalized string is,
"""
select where $1 in ($2 /*, ... */))
"""
Notice the extra close parenthesis that is added after the comment. This is
because although int4(1) is a function call it is rewritten as a Const
and that breaks the assumptions being made by the location of the
last expression.
Also, something like:
"""
select where 1 in (1, 2, cast(4 as int));
"""
is normalized as:
"""
select where $1 in ($2 /*, ... */ as int))
"""
I don't think the current state is acceptable, if it results in pg_s_s
storing an invalid normalized version of the sql.
Now, with the attached v2 supporting external params, we see other normalization
anomalies such as
"""
postgres=# select where $1 in ($3, $2) and 1 in ($4, cast($5 as int))
\bind 0 1 2 3 4
postgres-# ;
--
(0 rows)
postgres=# select toplevel, query, calls from pg_stat_statements;
toplevel | query
| calls
----------+-------------------------------------------------------------------------+-------
t | select where $1 in ($2 /*, ... */) and $3 in ($4 /*, ...
*/($5 as int)) | 1
(1 row)
"""
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
thoughts?
--
Sami Imseih
Amazon Web Services (AWS)
On Tue, May 06, 2025 at 01:32:48PM GMT, Sami Imseih wrote:
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.As I was looking further into what we currently have in v18 and HEAD
the normalization could break if we pass a function.[...]
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.
Maybe it would be possible to address thins in not too complicated way
in fill_in_constant_lengths, since it already operates with parsed
tokens.
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.
If we are picking up the start and end points from gram.c and we add these
positions to A_Expr or A_ArrayExpr and then make them available to ArrayExpr,
then we know the exact boundary of the IN list. Even if a function
call is simplified down
to a constant, it will not really matter because we are going to normalize
between the original opening and closing parentheses of the IN list.
(Actually, we can even track the actual textual starting and end point of a List
as well)
Attached ( not in patch form ) is the idea for this.
```
postgres=# select where 1 in (1, int4(1));
--
(1 row)
postgres=# select where 1 in (1, int4($1::int)) \bind 1
postgres-# ;
--
(1 row)
postgres=# select toplevel, query, calls from pg_stat_statements;
toplevel | query | calls
----------+------------------------------------+-------
t | select where $1 in ($2 /*, ... */) | 2
(1 row)
```
What do you think?
--
Sami Imseih
Attachments:
Sami-WIP-Allow-query-jumble-to-squash-a-list-external-para.txttext/plain; charset=US-ASCII; name=Sami-WIP-Allow-query-jumble-to-squash-a-list-external-para.txtDownload
From 15f1313ef66e964e588b0bf19ede676437ea5a42 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Tue, 6 May 2025 13:52:11 -0500
Subject: [PATCH v1 1/1] Allow query jumble to squash a list external
parameters
---
.../pg_stat_statements/expected/squashing.out | 14 +-
.../pg_stat_statements/pg_stat_statements.c | 84 +++---------
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 121 +++++++++++-------
src/backend/parser/gram.y | 37 ++++--
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 4 +
src/include/nodes/primnodes.h | 2 +
8 files changed, 139 insertions(+), 129 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..d92cfbd35fb 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -246,7 +246,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -353,7 +353,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -376,7 +376,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -393,10 +393,10 @@ SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..efcad87d684 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,10 +2825,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2842,9 +2838,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2857,7 +2850,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
tok_len; /* Length (in bytes) of that tok */
off = jstate->clocations[i].location;
-
/* Adjust recorded location if we're dealing with partial string */
off -= query_loc;
@@ -2866,67 +2858,24 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
- /*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
- */
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
-
- skipped_constants++;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ /*
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment in place of
+ * the second and remaining values of the list.
+ */
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ i + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
- /* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
last_off = off;
last_tok_len = tok_len;
@@ -3017,6 +2966,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..17ba3696226 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1324,7 +1324,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..27d76a493be 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static uint64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -381,7 +381,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +396,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +419,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -437,22 +443,45 @@ IsSquashableConst(Node *element)
{
Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
+ switch (nodeTag(arg))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
+
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
}
- return true;
+ return false;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
/*
* Subroutine for _jumbleElements: Verify whether the provided list
- * can be squashed, meaning it contains only constant expressions.
+ * can be squashed, meaning it contains only constant and external
+ * parameter expressions.
*
* Return value indicates if squashing is possible.
*
@@ -461,7 +490,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +503,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +543,37 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow for Constants and Params of type external to be squashed. To
+ * be able to normalize such queries by stripping away the squashed away
+ * values, we must track the start and end of the expression list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->expr_start > 0 && aexpr->expr_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->expr_start + 1,
+ (aexpr->expr_end - aexpr->expr_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3c4268b271a..dff65cc0f6c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -184,7 +184,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int expr_end);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -206,6 +206,10 @@ static void preprocess_pubobj_list(List *pubobjspec_list,
core_yyscan_t yyscanner);
static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+/* global variables */
+static ParseLoc expr_list_start = 0;
+static ParseLoc expr_list_end = 0;
+
%}
%pure-parser
@@ -15298,7 +15302,12 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *aexpr = makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ aexpr->expr_start = expr_list_start;
+ aexpr->expr_end = expr_list_end;
+ $$ = (Node *) aexpr;
+ expr_list_start = 0;
+ expr_list_end = 0;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
@@ -15321,7 +15330,12 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *aexpr = makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ aexpr->expr_start = expr_list_start;
+ aexpr->expr_end = expr_list_end;
+ $$ = (Node *) aexpr;
+ expr_list_start = 0;
+ expr_list_end = 0;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16757,15 +16771,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16895,7 +16909,12 @@ in_expr: select_with_parens
/* other fields will be filled later */
$$ = (Node *) n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
+ | '(' expr_list ')'
+ {
+ $$ = (Node *) $2;
+ expr_list_start = @1;
+ expr_list_end = @3;
+ }
;
/*
@@ -19293,12 +19312,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int expr_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->expr_start = location;
+ n->expr_end = expr_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..f54bf86b520 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->expr_start = a->expr_start;
+ newa->expr_end = a->expr_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->expr_start = a->expr_start;
+ newa->expr_end = a->expr_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..ee9cd1f25b9 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc expr_start;
+ ParseLoc expr_end;
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc expr_start;
+ ParseLoc expr_end;
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..0d9cb292464 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,8 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ ParseLoc expr_start;
+ ParseLoc expr_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
On Tue, May 06, 2025 at 01:32:48PM -0500, Sami Imseih wrote:
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
FWIW, this is exactly the kind of issues we have spent time on when
improving the location detection of sub-queries for some DDL patterns,
and the parser is the only path I am aware of where this can be done
sanely because the extra characters that may, or may not, be included
in some of the expressions would be naturally discarded based on the @
locations we retrieve.
For reference, this story is part of 499edb09741b, more precisely
around this part of the thread:
/messages/by-id/CACJufxF9hqyfmKEdpiG=PbrGdKVNP2BQjHFJh4q6639sV7NmvQ@mail.gmail.com
(FWIW, I've seen assumptions around the detection of specific
locations done outside the parser in pg_hint_plan as well, that did
not finish well because the code makes assumptions that natural
parsers are just better at because they're designed to detect such
cases.)
--
Michael
On Tue, May 06, 2025 at 03:01:32PM GMT, Sami Imseih wrote:
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.If we are picking up the start and end points from gram.c and we add these
positions to A_Expr or A_ArrayExpr and then make them available to ArrayExpr,
then we know the exact boundary of the IN list. Even if a function
call is simplified down
to a constant, it will not really matter because we are going to normalize
between the original opening and closing parentheses of the IN list.
What do you think?
Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.
On Wed, May 07, 2025 at 10:41:22AM +0200, Dmitry Dolgov wrote:
Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.
Yes, that feels invasive. The use of two static variables to track
the start and the end positions in an expression list can also be a
bit unstable to rely on, I think. It seems to me that this part
could be handled in a new Node that takes care of tracking the two
positions, instead, be it a start/end couple or a location/length
couple? That seems necessary to have when going through
jumbleElements().
--
Michael
On Thu, May 08, 2025 at 02:22:00PM GMT, Michael Paquier wrote:
On Wed, May 07, 2025 at 10:41:22AM +0200, Dmitry Dolgov wrote:Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.Yes, that feels invasive. The use of two static variables to track
the start and the end positions in an expression list can also be a
bit unstable to rely on, I think. It seems to me that this part
could be handled in a new Node that takes care of tracking the two
positions, instead, be it a start/end couple or a location/length
couple? That seems necessary to have when going through
jumbleElements().
To clarify, I had in mind something like in the attached patch. The
idea is to make start/end location capturing relatively independent from
the constants squashing. The new parsing node conveys the location
information, which is then getting transformed to be a part of an
ArrayExpr. It's done for in_expr only here, something similar would be
needed for array_expr as well. Feedback is appreciated.
Attachments:
v1-0001-Introduce-LocationExpr.patchtext/plain; charset=us-asciiDownload
From bcab1a8364979dce146b36e31865e9b670bd892b Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:01 +0200
Subject: [PATCH v1 1/2] Introduce LocationExpr
Add LocationExpr wrapper node to capture start and end location of an
expression in a query. Use it in to wrap expr_list in in_expr and
convery location information to ArrayExpr.
---
src/backend/nodes/nodeFuncs.c | 36 ++++++++++++++++++++++++++++++++
src/backend/parser/gram.y | 11 +++++++++-
src/backend/parser/parse_expr.c | 28 ++++++++++++++++++++++++-
src/include/nodes/parsenodes.h | 15 +++++++++++++
src/include/nodes/primnodes.h | 2 ++
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 91 insertions(+), 2 deletions(-)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 7bc823507f1..6f6a7079d55 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -284,6 +284,12 @@ exprType(const Node *expr)
case T_PlaceHolderVar:
type = exprType((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ type = exprType((Node *) n->expr);
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
type = InvalidOid; /* keep compiler quiet */
@@ -536,6 +542,11 @@ exprTypmod(const Node *expr)
return exprTypmod((Node *) ((const ReturningExpr *) expr)->retexpr);
case T_PlaceHolderVar:
return exprTypmod((Node *) ((const PlaceHolderVar *) expr)->phexpr);
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ return exprTypmod((Node *) n->expr);
+ }
default:
break;
}
@@ -1058,6 +1069,9 @@ exprCollation(const Node *expr)
case T_PlaceHolderVar:
coll = exprCollation((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ coll = exprCollation((Node *) ((const LocationExpr *) expr)->expr);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
coll = InvalidOid; /* keep compiler quiet */
@@ -1306,6 +1320,10 @@ exprSetCollation(Node *expr, Oid collation)
/* NextValueExpr's result is an integer type ... */
Assert(!OidIsValid(collation)); /* ... so never set a collation */
break;
+ case T_LocationExpr:
+ exprSetCollation((Node *) ((LocationExpr *) expr)->expr,
+ collation);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
break;
@@ -1803,6 +1821,9 @@ exprLocation(const Node *expr)
case T_PartitionRangeDatum:
loc = ((const PartitionRangeDatum *) expr)->location;
break;
+ case T_LocationExpr:
+ loc = ((const LocationExpr *) expr)->start_location;
+ break;
default:
/* for any other node type it's just unknown... */
loc = -1;
@@ -2668,6 +2689,8 @@ expression_tree_walker_impl(Node *node,
return true;
}
break;
+ case T_LocationExpr:
+ return WALK(((LocationExpr *) node)->expr);
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
@@ -3744,6 +3767,17 @@ expression_tree_mutator_impl(Node *node,
return (Node *) newnode;
}
break;
+ case T_LocationExpr:
+ {
+ LocationExpr *expr = (LocationExpr *) node;
+ LocationExpr *newnode;
+
+ FLATCOPY(newnode, expr, LocationExpr);
+ MUTATE(newnode->expr, expr->expr, Node *);
+
+ return (Node *) newnode;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
@@ -4705,6 +4739,8 @@ raw_expression_tree_walker_impl(Node *node,
return true;
}
break;
+ case T_LocationExpr:
+ return WALK(((LocationExpr *) node)->expr);
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3c4268b271a..3ffa5335f4e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -16895,7 +16895,16 @@ in_expr: select_with_parens
/* other fields will be filled later */
$$ = (Node *) n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
+ | '(' expr_list ')'
+ {
+ LocationExpr *n = makeNode(LocationExpr);
+
+ n->expr = (Node *) $2;
+ n->start_location = @1 + 1;
+ n->end_location = @3 - 1;
+
+ $$ = (Node *) n;
+ }
;
/*
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..30d52c801eb 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -370,6 +370,22 @@ transformExprRecurse(ParseState *pstate, Node *expr)
result = transformJsonFuncExpr(pstate, (JsonFuncExpr *) expr);
break;
+ case T_LocationExpr:
+ {
+ LocationExpr *loc = (LocationExpr *) expr;
+ if (IsA(loc->expr, ArrayExpr))
+ {
+ ArrayExpr *arr = (ArrayExpr *) loc->expr;
+ arr->loc_range = list_make2_int(loc->start_location,
+ loc->end_location);
+
+ result = (Node *) arr;
+ }
+ else
+ result = (Node *) loc->expr;
+ }
+ break;
+
default:
/* should not reach here */
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
@@ -1125,6 +1141,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
{
Node *result = NULL;
Node *lexpr;
+ LocationExpr *location = NULL;
List *rexprs;
List *rvars;
List *rnonvars;
@@ -1139,6 +1156,9 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
else
useOr = true;
+ if (IsA(a->rexpr, LocationExpr))
+ location = (LocationExpr *) a->rexpr;
+
/*
* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
@@ -1152,7 +1172,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
*/
lexpr = transformExprRecurse(pstate, a->lexpr);
rexprs = rvars = rnonvars = NIL;
- foreach(l, (List *) a->rexpr)
+ foreach(l, (List *) transformExprRecurse(pstate, a->rexpr))
{
Node *rexpr = transformExprRecurse(pstate, lfirst(l));
@@ -1224,6 +1244,12 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ if (location)
+ newa->loc_range = list_make2_int(location->start_location,
+ location->end_location);
+ else
+ newa->loc_range = NIL;
+
result = (Node *) make_scalar_array_op(pstate,
a->name,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..5107bfad9a6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -504,6 +504,21 @@ typedef struct A_ArrayExpr
ParseLoc location; /* token location, or -1 if unknown */
} A_ArrayExpr;
+/*
+ * A wrapper expression to record start and end location
+ */
+typedef struct LocationExpr
+{
+ NodeTag type;
+
+ /* the node to be wrapped */
+ Node *expr;
+ /* token location, or -1 if unknown */
+ ParseLoc start_location;
+ /* token location, or -1 if unknown */
+ ParseLoc end_location;
+} LocationExpr;
+
/*
* ResTarget -
* result target (used in target list of pre-transformed parse trees)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..ffee0f7768f 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,8 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+
+ List *loc_range pg_node_attr(query_jumble_ignore);
} ArrayExpr;
/*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e5879e00dff..e6fcba24396 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,6 +1501,7 @@ LOCALLOCK
LOCALLOCKOWNER
LOCALLOCKTAG
LOCALPREDICATELOCK
+LocationExpr
LOCK
LOCKMASK
LOCKMETHODID
base-commit: b0635bfda0535a7fc36cd11d10eecec4e2a96330
--
2.45.1
v1-0002-Use-LocationExpr-in-squashing.patchtext/plain; charset=us-asciiDownload
From b4d325d11d1ec6912a4bb07c4f3ff2ce079212b8 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:20 +0200
Subject: [PATCH v1 2/2] Use LocationExpr in squashing
For the purpose of constants squashing we have only start location of an
expression, which is not enouch if the constant is wrapped e.g. in a
cast function. Apply information conveyed via LocationExpr to improve
squashing of constants.
Based on an idea from Sami Imseih.
---
.../pg_stat_statements/expected/squashing.out | 42 +++++++---
.../pg_stat_statements/pg_stat_statements.c | 35 +++------
contrib/pg_stat_statements/sql/squashing.sql | 8 +-
src/backend/nodes/queryjumblefuncs.c | 76 ++++++++++++-------
4 files changed, 97 insertions(+), 64 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..df9ff9d5637 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -147,6 +147,24 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+-- Parsing
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, 2, int4(1));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
@@ -246,7 +264,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -353,7 +371,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -376,7 +394,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -393,10 +411,10 @@ SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,7 +427,7 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
+-- Simple array is not squashed yet
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -423,9 +441,9 @@ SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */] | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT ARRAY[$1, $2, $3, $4, $5, $6, $7, $8, $9, $10] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..314e065b364 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,10 +2825,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2886,12 +2882,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
- else if (!in_squashed)
+ else
{
/*
* This location is the start position of a run of constants to be
@@ -2903,27 +2896,12 @@ generate_normalized_query(JumbleState *jstate, const char *query,
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
-
- skipped_constants++;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
/* Otherwise the constant is squashed away -- move forward */
@@ -3012,6 +2990,13 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
int loc = locs[i].location;
int tok;
+ /* Squashed constants are recorded with a length set already */
+ if (locs[i].squashed)
+ {
+ Assert(locs[i].length != -1);
+ continue;
+ }
+
/* Adjust recorded location if we're dealing with partial string */
loc -= query_loc;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 908be81ff2b..d30541a275c 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -49,6 +49,12 @@ SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Parsing
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, 2, int4(1));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
@@ -163,7 +169,7 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
+-- Simple array is not squashed yet
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..ae26406bbbc 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -62,8 +62,10 @@ static void AppendJumble(JumbleState *jstate,
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
int location, bool squashed);
+static void RecordConstLocationRange(JumbleState *jstate,
+ int start, int end, bool squashed);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *expr);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -403,6 +405,32 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
}
}
+/*
+ * Similar to RecordConstLocation, RecordConstLocationRange stores a constant
+ * with location start and end boundaries.
+ */
+static void
+RecordConstLocationRange(JumbleState *jstate, int start, int end, bool squashed)
+{
+ /* -1 indicates unknown or undefined location */
+ if (start >= 0 && end >= 0)
+ {
+ /* enlarge array if needed */
+ if (jstate->clocations_count >= jstate->clocations_buf_size)
+ {
+ jstate->clocations_buf_size *= 2;
+ jstate->clocations = (LocationLen *)
+ repalloc(jstate->clocations,
+ jstate->clocations_buf_size *
+ sizeof(LocationLen));
+ }
+ jstate->clocations[jstate->clocations_count].location = start;
+ jstate->clocations[jstate->clocations_count].squashed = squashed;
+ jstate->clocations[jstate->clocations_count].length = end - start + 1;
+ jstate->clocations_count++;
+ }
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
@@ -461,7 +489,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableConstList(List *elements)
{
ListCell *temp;
@@ -473,13 +501,8 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
return false;
foreach(temp, elements)
- {
if (!IsSquashableConst(lfirst(temp)))
return false;
- }
-
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
return true;
}
@@ -487,7 +510,7 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+ _jumbleElements(jstate, (List *) expr->list, (Node *) expr)
#define JUMBLE_LOCATION(location) \
RecordConstLocation(jstate, expr->location, false)
#define JUMBLE_FIELD(item) \
@@ -523,28 +546,29 @@ do { \
* elements in the list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *expr)
{
- Node *first,
- *last;
-
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableConstList(elements))
{
+ ArrayExpr *array;
+
/*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
+ * Currenlty only ArrayExpr provides location information, needed for
+ * squashing.
*/
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ Assert(IsA(expr, ArrayExpr));
+ array = (ArrayExpr *) expr;
+
+ /*
+ * If the parent ArrayExpr has location information, i.e. start and the
+ * end of the expression, use it as boundaries for squashing.
+ */
+ if (array->loc_range != NIL)
+ RecordConstLocationRange(jstate,
+ linitial_int(array->loc_range),
+ lsecond_int(array->loc_range), true);
+ else
+ _jumbleNode(jstate, (Node *) elements);
}
else
{
--
2.45.1
On Thu, May 8, 2025 at 2:36 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Thu, May 08, 2025 at 02:22:00PM GMT, Michael Paquier wrote:
On Wed, May 07, 2025 at 10:41:22AM +0200, Dmitry Dolgov wrote:Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.Yes, that feels invasive. The use of two static variables to track
the start and the end positions in an expression list can also be a
bit unstable to rely on, I think. It seems to me that this part
could be handled in a new Node that takes care of tracking the two
positions, instead, be it a start/end couple or a location/length
couple? That seems necessary to have when going through
jumbleElements().To clarify, I had in mind something like in the attached patch. The
idea is to make start/end location capturing relatively independent from
the constants squashing. The new parsing node conveys the location
information, which is then getting transformed to be a part of an
ArrayExpr. It's done for in_expr only here, something similar would be
needed for array_expr as well. Feedback is appreciated.
Thanks! I took a quick look at v1-0001 and it feels like a much better approach
than the quick hack I put together earlier. I will look thoroughly.
--
Sami
On Thu, May 08, 2025 at 03:50:32PM -0500, Sami Imseih wrote:
On Thu, May 8, 2025 at 2:36 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
To clarify, I had in mind something like in the attached patch. The
idea is to make start/end location capturing relatively independent from
the constants squashing. The new parsing node conveys the location
information, which is then getting transformed to be a part of an
ArrayExpr. It's done for in_expr only here, something similar would be
needed for array_expr as well. Feedback is appreciated.Thanks! I took a quick look at v1-0001 and it feels like a much better approach
than the quick hack I put together earlier. I will look thoroughly.
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */] | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT ARRAY[$1, $2, $3, $4, $5, $6, $7, $8, $9, $10] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
Yes, we are going to need more than that for such cases if we want to
cover all the ground we're aiming for.
Putting that aside, the test coverage for ARRAY[] elements is also
very limited on HEAD with one single test only with a set of
constants. We really should improve that, tracking more patterns and
more mixed combinations to see what gets squashed and what is not. So
this should be extended with more cases, including expressions,
parameters and sublinks, with checks on pg_stat_statements.calls to
see how the counters are aggregated. That's going to be important
when people play with this code to track how things change when
manipulating the element jumbling. I'd suggest to do that separately
of the rest.
--
Michael
Hi Dmitry,
On Fri, May 9, 2025 at 3:36 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Thu, May 08, 2025 at 02:22:00PM GMT, Michael Paquier wrote:
On Wed, May 07, 2025 at 10:41:22AM +0200, Dmitry Dolgov wrote:Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.Yes, that feels invasive. The use of two static variables to track
the start and the end positions in an expression list can also be a
bit unstable to rely on, I think. It seems to me that this part
could be handled in a new Node that takes care of tracking the two
positions, instead, be it a start/end couple or a location/length
couple? That seems necessary to have when going through
jumbleElements().To clarify, I had in mind something like in the attached patch. The
idea is to make start/end location capturing relatively independent from
the constants squashing. The new parsing node conveys the location
information, which is then getting transformed to be a part of an
ArrayExpr. It's done for in_expr only here, something similar would be
needed for array_expr as well. Feedback is appreciated.
+/*
+ * A wrapper expression to record start and end location
+ */
+typedef struct LocationExpr
+{
+ NodeTag type;
+
+ /* the node to be wrapped */
+ Node *expr;
+ /* token location, or -1 if unknown */
+ ParseLoc start_location;
+ /* token location, or -1 if unknown */
+ ParseLoc end_location;
+} LocationExpr;
Why not a location and a length, it should be more natural, it
seems we use this convention in some existing nodes, like
RawStmt, InsertStmt etc.
--
Regards
Junwang Zhao
On Fri, May 09, 2025 at 11:05:43AM +0800, Junwang Zhao wrote:
Why not a location and a length, it should be more natural, it
seems we use this convention in some existing nodes, like
RawStmt, InsertStmt etc.
These are new concepts as of Postgres 18 (aka only on HEAD), chosen
mainly to match with the internals of pg_stat_statements as far as I
recall. Doing the same here would not hurt, but it may be better
depending on the cases to rely on a start/end. I suspect that
switching from one to the other should not change much the internal
squashing logic.
--
Michael
On Fri, May 9, 2025 at 1:35 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, May 09, 2025 at 11:05:43AM +0800, Junwang Zhao wrote:
Why not a location and a length, it should be more natural, it
seems we use this convention in some existing nodes, like
RawStmt, InsertStmt etc.These are new concepts as of Postgres 18 (aka only on HEAD), chosen
mainly to match with the internals of pg_stat_statements as far as I
recall. Doing the same here would not hurt, but it may be better
depending on the cases to rely on a start/end.
ISTM that for string manipulation, start_pos/length are more appropriate,
start/end are often better suited for iterator use, where start refers to the
first element and end marks the position one past the last element.
Just my opinion, I can live with either way though.
I suspect that
switching from one to the other should not change much the internal
squashing logic.
Yeah, not much difference, one can easily be computed from the other.
--
Michael
--
Regards
Junwang Zhao
On Fri, May 09, 2025 at 02:35:33PM GMT, Michael Paquier wrote:
On Fri, May 09, 2025 at 11:05:43AM +0800, Junwang Zhao wrote:Why not a location and a length, it should be more natural, it
seems we use this convention in some existing nodes, like
RawStmt, InsertStmt etc.These are new concepts as of Postgres 18 (aka only on HEAD), chosen
mainly to match with the internals of pg_stat_statements as far as I
recall. Doing the same here would not hurt, but it may be better
depending on the cases to rely on a start/end. I suspect that
switching from one to the other should not change much the internal
squashing logic.
Right, switching from start/length to start/end wouldn't change much for
squashing. I didn't have any strong reason to go with start/end from my
side, so if start/length is more aligned with other nodes, let's change
that.
On Fri, May 09, 2025 at 08:47:58AM GMT, Michael Paquier wrote: SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls -----------------------------------------------------+------- - SELECT ARRAY[$1 /*, ... */] | 1 - SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 + query | calls +-------------------------------------------------------+------- + SELECT ARRAY[$1, $2, $3, $4, $5, $6, $7, $8, $9, $10] | 1 + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 (2 rows)Yes, we are going to need more than that for such cases if we want to
cover all the ground we're aiming for.Putting that aside, the test coverage for ARRAY[] elements is also
very limited on HEAD with one single test only with a set of
constants. We really should improve that, tracking more patterns and
more mixed combinations to see what gets squashed and what is not. So
this should be extended with more cases, including expressions,
parameters and sublinks, with checks on pg_stat_statements.calls to
see how the counters are aggregated. That's going to be important
when people play with this code to track how things change when
manipulating the element jumbling. I'd suggest to do that separately
of the rest.
Agree, I'll try to extend number of test cases here as a separate patch.
To clarify, I had in mind something like in the attached patch. The
idea is to make start/end location capturing relatively independent from
the constants squashing. The new parsing node conveys the location
information, which is then getting transformed to be a part of an
ArrayExpr. It's done for in_expr only here, something similar would be
needed for array_expr as well. Feedback is appreciated.Thanks! I took a quick look at v1-0001 and it feels like a much better approach
than the quick hack I put together earlier. I will look thoroughly.
I took a look at v1-0001 and I am wondering if this can be further simplified.
We really need a new Node just to wrap the start/end locations of the List
coming from the in_expr and this node should not really be needed past parsing.
array_expr is even simpler because we have the boundaries available
when makeAArrayExpr is called.
So, I think we can create a new parse node ( parsenode.h ) that will only be
used in parsing (and gram.c only ) to track the start/end locations
and List and
based on this node we can create A_ArrayExpr and A_Expr with the List
of boundaries,
and then all we have to do is update ArrayExpr with the boundaries during
the respective transformXExpr call. This seems like a much simpler approach
that also addresses Michael's concern of defining static variables in gram.y to
track the boundaries.
what do you think?
--
Sami
On Fri, May 09, 2025 at 10:12:24AM GMT, Dmitry Dolgov wrote:
Agree, I'll try to extend number of test cases here as a separate patch.
Here is the extended version, where start/end is replaced by
location/length, array_expr is handled as well, and more ARRAY cases are
added.
Attachments:
v2-0001-Introduce-LocationExpr.patchtext/plain; charset=us-asciiDownload
From 81fe0b08473eafc88cdc56b275e6f0e08ab8858c Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:01 +0200
Subject: [PATCH v2 1/3] Introduce LocationExpr
Add LocationExpr wrapper node to capture location and length of an
expression in a query. Use it in to wrap expr_list in in_expr and
array_expr conveying location information to ArrayExpr.
---
src/backend/nodes/nodeFuncs.c | 23 +++++++++++++++
src/backend/parser/gram.y | 31 +++++++++++++++++----
src/backend/parser/parse_expr.c | 48 ++++++++++++++++++++++++++++++--
src/include/nodes/parsenodes.h | 17 ++++++++++-
src/include/nodes/primnodes.h | 7 +++++
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 118 insertions(+), 9 deletions(-)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 7bc823507f1..f0b05630fd1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -284,6 +284,12 @@ exprType(const Node *expr)
case T_PlaceHolderVar:
type = exprType((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ type = exprType((Node *) n->expr);
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
type = InvalidOid; /* keep compiler quiet */
@@ -536,6 +542,11 @@ exprTypmod(const Node *expr)
return exprTypmod((Node *) ((const ReturningExpr *) expr)->retexpr);
case T_PlaceHolderVar:
return exprTypmod((Node *) ((const PlaceHolderVar *) expr)->phexpr);
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ return exprTypmod((Node *) n->expr);
+ }
default:
break;
}
@@ -1058,6 +1069,9 @@ exprCollation(const Node *expr)
case T_PlaceHolderVar:
coll = exprCollation((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ coll = exprCollation((Node *) ((const LocationExpr *) expr)->expr);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
coll = InvalidOid; /* keep compiler quiet */
@@ -1306,6 +1320,10 @@ exprSetCollation(Node *expr, Oid collation)
/* NextValueExpr's result is an integer type ... */
Assert(!OidIsValid(collation)); /* ... so never set a collation */
break;
+ case T_LocationExpr:
+ exprSetCollation((Node *) ((LocationExpr *) expr)->expr,
+ collation);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
break;
@@ -1803,6 +1821,9 @@ exprLocation(const Node *expr)
case T_PartitionRangeDatum:
loc = ((const PartitionRangeDatum *) expr)->location;
break;
+ case T_LocationExpr:
+ loc = ((const LocationExpr *) expr)->location;
+ break;
default:
/* for any other node type it's just unknown... */
loc = -1;
@@ -4705,6 +4726,8 @@ raw_expression_tree_walker_impl(Node *node,
return true;
}
break;
+ case T_LocationExpr:
+ return WALK(((LocationExpr *) node)->expr);
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3c4268b271a..8c8271f620d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -184,7 +184,8 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(Node *elements, int location);
+static Node *makeLocationExpr(Node *expr, int location, int length);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -16757,15 +16758,18 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ Node *locExpr = makeLocationExpr((Node *) $2, @1, @3);
+ $$ = makeAArrayExpr(locExpr, @1);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ Node *locExpr = makeLocationExpr((Node *) $2, @1, @3);
+ $$ = makeAArrayExpr(locExpr, @1);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ Node *locExpr = makeLocationExpr((Node *) NIL, @1, @2);
+ $$ = makeAArrayExpr(locExpr, @1);
}
;
@@ -16895,7 +16899,10 @@ in_expr: select_with_parens
/* other fields will be filled later */
$$ = (Node *) n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
+ | '(' expr_list ')'
+ {
+ $$ = (Node *) makeLocationExpr((Node *) $2, @1, @3);
+ }
;
/*
@@ -19293,7 +19300,7 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(Node *elements, int location)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
@@ -19302,6 +19309,18 @@ makeAArrayExpr(List *elements, int location)
return (Node *) n;
}
+static Node *
+makeLocationExpr(Node *expr, int start_location, int end_location)
+{
+ LocationExpr *n = makeNode(LocationExpr);
+
+ n->expr = expr;
+ n->location = start_location + 1;
+ n->length = end_location - start_location - 1;
+
+ return (Node *) n;
+}
+
static Node *
makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod, int location)
{
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..b48beff157e 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -370,6 +370,25 @@ transformExprRecurse(ParseState *pstate, Node *expr)
result = transformJsonFuncExpr(pstate, (JsonFuncExpr *) expr);
break;
+ case T_LocationExpr:
+ {
+ LocationExpr *loc = (LocationExpr *) expr;
+ if (!IsA(loc->expr, List))
+ result = transformExprRecurse(pstate, loc->expr);
+ else
+ result = loc->expr;
+
+ if (IsA(result, ArrayExpr))
+ {
+ ArrayExpr *arr = (ArrayExpr *) result;
+ arr->loc_range = list_make2_int(loc->location,
+ loc->length);
+
+ result = (Node *) arr;
+ }
+ }
+ break;
+
default:
/* should not reach here */
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
@@ -1125,6 +1144,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
{
Node *result = NULL;
Node *lexpr;
+ LocationExpr *location = NULL;
List *rexprs;
List *rvars;
List *rnonvars;
@@ -1139,6 +1159,9 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
else
useOr = true;
+ if (IsA(a->rexpr, LocationExpr))
+ location = (LocationExpr *) a->rexpr;
+
/*
* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
@@ -1152,7 +1175,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
*/
lexpr = transformExprRecurse(pstate, a->lexpr);
rexprs = rvars = rnonvars = NIL;
- foreach(l, (List *) a->rexpr)
+ foreach(l, (List *) transformExprRecurse(pstate, a->rexpr))
{
Node *rexpr = transformExprRecurse(pstate, lfirst(l));
@@ -1224,6 +1247,12 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ if (location)
+ newa->loc_range = list_make2_int(location->location,
+ location->length);
+ else
+ newa->loc_range = NIL;
+
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2014,12 +2043,22 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
Oid array_type, Oid element_type, int32 typmod)
{
ArrayExpr *newa = makeNode(ArrayExpr);
+ List *elements = NIL;
List *newelems = NIL;
List *newcoercedelems = NIL;
ListCell *element;
+ LocationExpr *locExpr = NULL;
Oid coerce_type;
bool coerce_hard;
+ if (IsA(a->elements, LocationExpr))
+ {
+ locExpr = (LocationExpr *) a->elements;
+ elements = (List *) locExpr->expr;
+ }
+ else
+ elements = (List *) a->elements;
+
/*
* Transform the element expressions
*
@@ -2027,7 +2066,7 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
* element expression.
*/
newa->multidims = false;
- foreach(element, a->elements)
+ foreach(element, elements)
{
Node *e = (Node *) lfirst(element);
Node *newe;
@@ -2166,6 +2205,11 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ if (locExpr)
+ newa->loc_range = list_make2_int(locExpr->location,
+ locExpr->length);
+ else
+ newa->loc_range = NIL;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..f3e4ba47af1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -500,10 +500,25 @@ typedef struct A_Indirection
typedef struct A_ArrayExpr
{
NodeTag type;
- List *elements; /* array element expressions */
+ Node *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
} A_ArrayExpr;
+/*
+ * A wrapper expression to record start and end location
+ */
+typedef struct LocationExpr
+{
+ NodeTag type;
+
+ /* the node to be wrapped */
+ Node *expr;
+ /* token location, or -1 if unknown */
+ ParseLoc location;
+ /* token length, or -1 if unknown */
+ ParseLoc length;
+} LocationExpr;
+
/*
* ResTarget -
* result target (used in target list of pre-transformed parse trees)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..60dec576908 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,13 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+
+ /*
+ * Pair (location, length) for list of elements. Note, that location field
+ * cannot always be used here instead, since it could be unknown, e.g. if
+ * the node was created in transformAExprIn.
+ */
+ List *loc_range pg_node_attr(query_jumble_ignore);
} ArrayExpr;
/*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e5879e00dff..e6fcba24396 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,6 +1501,7 @@ LOCALLOCK
LOCALLOCKOWNER
LOCALLOCKTAG
LOCALPREDICATELOCK
+LocationExpr
LOCK
LOCKMASK
LOCKMETHODID
base-commit: b0635bfda0535a7fc36cd11d10eecec4e2a96330
--
2.45.1
v2-0002-Use-LocationExpr-in-squashing.patchtext/plain; charset=us-asciiDownload
From 3daccfa2ff77e78d0e04e092d9801fe6f7a22bc7 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:20 +0200
Subject: [PATCH v2 2/3] Use LocationExpr in squashing
For the purpose of constants squashing we have only start location of an
expression, which is not enouch if the constant is wrapped e.g. in a
cast function. Apply information conveyed via LocationExpr to improve
squashing of constants.
Based on an idea from Sami Imseih.
---
.../pg_stat_statements/expected/squashing.out | 35 +++++++--
.../pg_stat_statements/pg_stat_statements.c | 35 +++------
contrib/pg_stat_statements/sql/squashing.sql | 9 ++-
src/backend/nodes/queryjumblefuncs.c | 76 ++++++++++++-------
4 files changed, 95 insertions(+), 60 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..a924a8c6e4c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -147,6 +147,24 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+-- Parsing
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, 2, int4(1));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
@@ -246,7 +264,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -353,7 +371,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -376,7 +394,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -393,10 +411,10 @@ SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,7 +427,8 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
+-- Arrays
+-- Simple array is squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..314e065b364 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,10 +2825,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2886,12 +2882,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
- else if (!in_squashed)
+ else
{
/*
* This location is the start position of a run of constants to be
@@ -2903,27 +2896,12 @@ generate_normalized_query(JumbleState *jstate, const char *query,
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
-
- skipped_constants++;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
/* Otherwise the constant is squashed away -- move forward */
@@ -3012,6 +2990,13 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
int loc = locs[i].location;
int tok;
+ /* Squashed constants are recorded with a length set already */
+ if (locs[i].squashed)
+ {
+ Assert(locs[i].length != -1);
+ continue;
+ }
+
/* Adjust recorded location if we're dealing with partial string */
loc -= query_loc;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 908be81ff2b..f1a381e96cb 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -49,6 +49,12 @@ SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Parsing
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, 2, int4(1));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
@@ -163,7 +169,8 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
+-- Arrays
+-- Simple array is squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..6202e4065e8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -62,8 +62,10 @@ static void AppendJumble(JumbleState *jstate,
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
int location, bool squashed);
+static void RecordConstLocationRange(JumbleState *jstate,
+ int location, int length, bool squashed);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *expr);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -403,6 +405,32 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
}
}
+/*
+ * Similar to RecordConstLocation, RecordConstLocationRange stores a constant
+ * with location start and end boundaries.
+ */
+static void
+RecordConstLocationRange(JumbleState *jstate, int location, int length, bool squashed)
+{
+ /* -1 indicates unknown or undefined location */
+ if (location >= 0 && length >= 0)
+ {
+ /* enlarge array if needed */
+ if (jstate->clocations_count >= jstate->clocations_buf_size)
+ {
+ jstate->clocations_buf_size *= 2;
+ jstate->clocations = (LocationLen *)
+ repalloc(jstate->clocations,
+ jstate->clocations_buf_size *
+ sizeof(LocationLen));
+ }
+ jstate->clocations[jstate->clocations_count].location = location;
+ jstate->clocations[jstate->clocations_count].squashed = squashed;
+ jstate->clocations[jstate->clocations_count].length = length;
+ jstate->clocations_count++;
+ }
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
@@ -461,7 +489,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableConstList(List *elements)
{
ListCell *temp;
@@ -473,13 +501,8 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
return false;
foreach(temp, elements)
- {
if (!IsSquashableConst(lfirst(temp)))
return false;
- }
-
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
return true;
}
@@ -487,7 +510,7 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+ _jumbleElements(jstate, (List *) expr->list, (Node *) expr)
#define JUMBLE_LOCATION(location) \
RecordConstLocation(jstate, expr->location, false)
#define JUMBLE_FIELD(item) \
@@ -523,28 +546,29 @@ do { \
* elements in the list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *expr)
{
- Node *first,
- *last;
-
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableConstList(elements))
{
+ ArrayExpr *array;
+
/*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
+ * Currenlty only ArrayExpr provides location information, needed for
+ * squashing.
*/
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ Assert(IsA(expr, ArrayExpr));
+ array = (ArrayExpr *) expr;
+
+ /*
+ * If the parent ArrayExpr has location information, i.e. start and the
+ * end of the expression, use it as boundaries for squashing.
+ */
+ if (array->loc_range != NIL)
+ RecordConstLocationRange(jstate,
+ linitial_int(array->loc_range),
+ lsecond_int(array->loc_range), true);
+ else
+ _jumbleNode(jstate, (Node *) elements);
}
else
{
--
2.45.1
v2-0003-Extend-ARRAY-squashing-tests.patchtext/plain; charset=us-asciiDownload
From ac43cc478cfc70ecc968d30e58da380e466c1cfb Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Mon, 12 May 2025 10:04:39 +0200
Subject: [PATCH v2 3/3] Extend ARRAY squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones.
---
.../pg_stat_statements/expected/squashing.out | 178 ++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 59 ++++++
2 files changed, 237 insertions(+)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index a924a8c6e4c..1aeed911aad 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -448,3 +448,181 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ (SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+ array
+-------------------------------------------------
+ {100,200,300,400,500,600,700,800,900,1000,1100}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
+ abs($8), abs($9), abs($10), ((abs($11))) +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index f1a381e96cb..6884df1a90d 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -175,3 +175,62 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--
2.45.1
On Fri, May 09, 2025 at 12:47:19PM GMT, Sami Imseih wrote:
So, I think we can create a new parse node ( parsenode.h ) that will only be
used in parsing (and gram.c only ) to track the start/end locations
and List and
based on this node we can create A_ArrayExpr and A_Expr with the List
of boundaries,
and then all we have to do is update ArrayExpr with the boundaries during
the respective transformXExpr call. This seems like a much simpler approach
that also addresses Michael's concern of defining static variables in gram.y to
track the boundaries.
The static variables was only part of the concern, another part was
using A_Expr to carry this information, which will have impact on lots
of unrelated code.
On Fri, May 09, 2025 at 12:47:19PM GMT, Sami Imseih wrote:
So, I think we can create a new parse node ( parsenode.h ) that will only be
used in parsing (and gram.c only ) to track the start/end locations
and List and
based on this node we can create A_ArrayExpr and A_Expr with the List
of boundaries,
and then all we have to do is update ArrayExpr with the boundaries during
the respective transformXExpr call. This seems like a much simpler approach
that also addresses Michael's concern of defining static variables in gram.y to
track the boundaries.The static variables was only part of the concern, another part was
using A_Expr to carry this information, which will have impact on lots
of unrelated code.
What would be the problem if A_Expr carries an extra pointer to a List?
It already had other fields, rexpr, lexpr and location that could be no-op.
Also, LocationExpr is not really an expression node, but a wrapper to
an expression node, so I think it's wrong to define it as a Node and be
required to add the necessary handling for it in nodeFuncs.c. I think we
can just define it as a struct in gram.y so it can carry the locations of the
expression and then set the List of the location boundaries in
A_Expr and A_ArrayExpr. right?
typedef struct LocationExpr
{
Node *expr;
ParseLoc start_location;
ParseLoc end_location;
} LocationExpr;
--
Sami
On Mon, May 12, 2025 at 06:40:43PM GMT, Sami Imseih wrote:
The static variables was only part of the concern, another part was
using A_Expr to carry this information, which will have impact on lots
of unrelated code.What would be the problem if A_Expr carries an extra pointer to a List?
It already had other fields, rexpr, lexpr and location that could be no-op.
They can be empty sometimes, but the new fields will be empty 99% of the
time. This is a clear sign to me that this informaton does not belong to
a node for "infix, prefix, and postfix expressions", don't you think?
On Mon, May 12, 2025 at 06:40:43PM -0400, Sami Imseih wrote:
Also, LocationExpr is not really an expression node, but a wrapper to
an expression node, so I think it's wrong to define it as a Node and be
required to add the necessary handling for it in nodeFuncs.c. I think we
can just define it as a struct in gram.y so it can carry the locations of the
expression and then set the List of the location boundaries in
A_Expr and A_ArrayExpr. right?
Right. LocationExpr is not a full Node, so if we can do these
improvements without it we have less maintenance to worry about across
the board with less code paths. At the end, I think that we should
try to keep the amount of work done by PGSS as minimal as possible.
I was a bit worried about not using a Node but Sami has reminded me
last week that we already have in gram.y the concept of using some
private structures to track intermediate results done by the parsing
that we sometimes do not want to push down to the code calling the
parser. If we can do the same, the result could be nicer.
By the way, the new test cases for ARRAY lists are sent in the last
patch of the series posted on this thread:
/messages/by-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4y
These should be first in the list, IMO, so as it is possible to track
what the behavior was before the new logic as of HEAD, and what the
behavior would become after the new logic.
--
Michael
On Tue, May 20, 2025 at 06:30:25AM GMT, Michael Paquier wrote:
On Mon, May 12, 2025 at 06:40:43PM -0400, Sami Imseih wrote:Also, LocationExpr is not really an expression node, but a wrapper to
an expression node, so I think it's wrong to define it as a Node and be
required to add the necessary handling for it in nodeFuncs.c. I think we
can just define it as a struct in gram.y so it can carry the locations of the
expression and then set the List of the location boundaries in
A_Expr and A_ArrayExpr. right?Right. LocationExpr is not a full Node, so if we can do these
improvements without it we have less maintenance to worry about across
the board with less code paths. At the end, I think that we should
try to keep the amount of work done by PGSS as minimal as possible.I was a bit worried about not using a Node but Sami has reminded me
last week that we already have in gram.y the concept of using some
private structures to track intermediate results done by the parsing
that we sometimes do not want to push down to the code calling the
parser. If we can do the same, the result could be nicer.
I believe it's worth to not only to keep amount of work to support
LocationExpr as minimal as possible, but also impact on the existing
code. What I see as a problem is keeping such specific information as
the location boundaries in such a generic expression as A_Expr, where it
will almost never be used. Do I get it right, you folks are ok with
that?
At the same time AFAICT there isn't much more code paths to worry about
in case of a LocationExpr as a node -- in the end all options would have
to embed the location information into ArrayExpr during transformation,
independently from how this information was conveyed. Aside that the
only extra code we've got is node functions (exprType, etc). Is there
anything I'm missing here?
By the way, the new test cases for ARRAY lists are sent in the last
patch of the series posted on this thread:
/messages/by-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4yThese should be first in the list, IMO, so as it is possible to track
what the behavior was before the new logic as of HEAD, and what the
behavior would become after the new logic.
Sure, I can reshuffle that.
BTW, I'm going to be away for a couple of weeks soon. So if you want to
decide one way or another soonish, let's do it now.
On Tue, May 20, 2025 at 11:03:52AM GMT, Dmitry Dolgov wrote:
By the way, the new test cases for ARRAY lists are sent in the last
patch of the series posted on this thread:
/messages/by-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4yThese should be first in the list, IMO, so as it is possible to track
what the behavior was before the new logic as of HEAD, and what the
behavior would become after the new logic.Sure, I can reshuffle that.
Here is it, but the results are pretty much expected.
Attachments:
v3-0001-Extend-ARRAY-squashing-tests.patchtext/plain; charset=us-asciiDownload
From 8e22941aab88976631729ed43e3d4afaf616b691 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v3 1/3] Extend ARRAY squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones.
---
.../pg_stat_statements/expected/squashing.out | 184 ++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 60 ++++++
2 files changed, 244 insertions(+)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..730a52d6917 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -429,3 +429,187 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ($1 /*, ... */)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::int4::casttesttype +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ (SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+ array
+-------------------------------------------------
+ {100,200,300,400,500,600,700,800,900,1000,1100}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
+ abs($8), abs($9), abs($10), ((abs($11))) +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::bigint +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..5ac624ae1f7 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -167,3 +167,63 @@ FROM cte;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
base-commit: cbf53e2b8a8ed3fc6f554095a4e99591bd5193f6
--
2.45.1
v3-0002-Introduce-LocationExpr.patchtext/plain; charset=us-asciiDownload
From 0a7fed14e036a214daa668004daa97a023e7cc90 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:01 +0200
Subject: [PATCH v3 2/3] Introduce LocationExpr
Add LocationExpr wrapper node to capture location and length of an
expression in a query. Use it in to wrap expr_list in in_expr and
array_expr conveying location information to ArrayExpr.
---
src/backend/nodes/nodeFuncs.c | 23 +++++++++++++++
src/backend/parser/gram.y | 31 +++++++++++++++++----
src/backend/parser/parse_expr.c | 48 ++++++++++++++++++++++++++++++--
src/include/nodes/parsenodes.h | 17 ++++++++++-
src/include/nodes/primnodes.h | 7 +++++
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 118 insertions(+), 9 deletions(-)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 7bc823507f1..f0b05630fd1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -284,6 +284,12 @@ exprType(const Node *expr)
case T_PlaceHolderVar:
type = exprType((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ type = exprType((Node *) n->expr);
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
type = InvalidOid; /* keep compiler quiet */
@@ -536,6 +542,11 @@ exprTypmod(const Node *expr)
return exprTypmod((Node *) ((const ReturningExpr *) expr)->retexpr);
case T_PlaceHolderVar:
return exprTypmod((Node *) ((const PlaceHolderVar *) expr)->phexpr);
+ case T_LocationExpr:
+ {
+ const LocationExpr *n = (const LocationExpr *) expr;
+ return exprTypmod((Node *) n->expr);
+ }
default:
break;
}
@@ -1058,6 +1069,9 @@ exprCollation(const Node *expr)
case T_PlaceHolderVar:
coll = exprCollation((Node *) ((const PlaceHolderVar *) expr)->phexpr);
break;
+ case T_LocationExpr:
+ coll = exprCollation((Node *) ((const LocationExpr *) expr)->expr);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
coll = InvalidOid; /* keep compiler quiet */
@@ -1306,6 +1320,10 @@ exprSetCollation(Node *expr, Oid collation)
/* NextValueExpr's result is an integer type ... */
Assert(!OidIsValid(collation)); /* ... so never set a collation */
break;
+ case T_LocationExpr:
+ exprSetCollation((Node *) ((LocationExpr *) expr)->expr,
+ collation);
+ break;
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
break;
@@ -1803,6 +1821,9 @@ exprLocation(const Node *expr)
case T_PartitionRangeDatum:
loc = ((const PartitionRangeDatum *) expr)->location;
break;
+ case T_LocationExpr:
+ loc = ((const LocationExpr *) expr)->location;
+ break;
default:
/* for any other node type it's just unknown... */
loc = -1;
@@ -4705,6 +4726,8 @@ raw_expression_tree_walker_impl(Node *node,
return true;
}
break;
+ case T_LocationExpr:
+ return WALK(((LocationExpr *) node)->expr);
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..293a81b29b7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -184,7 +184,8 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(Node *elements, int location);
+static Node *makeLocationExpr(Node *expr, int location, int length);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -16764,15 +16765,18 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ Node *locExpr = makeLocationExpr((Node *) $2, @1, @3);
+ $$ = makeAArrayExpr(locExpr, @1);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ Node *locExpr = makeLocationExpr((Node *) $2, @1, @3);
+ $$ = makeAArrayExpr(locExpr, @1);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ Node *locExpr = makeLocationExpr((Node *) NIL, @1, @2);
+ $$ = makeAArrayExpr(locExpr, @1);
}
;
@@ -16902,7 +16906,10 @@ in_expr: select_with_parens
/* other fields will be filled later */
$$ = (Node *) n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
+ | '(' expr_list ')'
+ {
+ $$ = (Node *) makeLocationExpr((Node *) $2, @1, @3);
+ }
;
/*
@@ -19300,7 +19307,7 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(Node *elements, int location)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
@@ -19309,6 +19316,18 @@ makeAArrayExpr(List *elements, int location)
return (Node *) n;
}
+static Node *
+makeLocationExpr(Node *expr, int start_location, int end_location)
+{
+ LocationExpr *n = makeNode(LocationExpr);
+
+ n->expr = expr;
+ n->location = start_location + 1;
+ n->length = end_location - start_location - 1;
+
+ return (Node *) n;
+}
+
static Node *
makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod, int location)
{
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..b48beff157e 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -370,6 +370,25 @@ transformExprRecurse(ParseState *pstate, Node *expr)
result = transformJsonFuncExpr(pstate, (JsonFuncExpr *) expr);
break;
+ case T_LocationExpr:
+ {
+ LocationExpr *loc = (LocationExpr *) expr;
+ if (!IsA(loc->expr, List))
+ result = transformExprRecurse(pstate, loc->expr);
+ else
+ result = loc->expr;
+
+ if (IsA(result, ArrayExpr))
+ {
+ ArrayExpr *arr = (ArrayExpr *) result;
+ arr->loc_range = list_make2_int(loc->location,
+ loc->length);
+
+ result = (Node *) arr;
+ }
+ }
+ break;
+
default:
/* should not reach here */
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
@@ -1125,6 +1144,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
{
Node *result = NULL;
Node *lexpr;
+ LocationExpr *location = NULL;
List *rexprs;
List *rvars;
List *rnonvars;
@@ -1139,6 +1159,9 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
else
useOr = true;
+ if (IsA(a->rexpr, LocationExpr))
+ location = (LocationExpr *) a->rexpr;
+
/*
* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
@@ -1152,7 +1175,7 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
*/
lexpr = transformExprRecurse(pstate, a->lexpr);
rexprs = rvars = rnonvars = NIL;
- foreach(l, (List *) a->rexpr)
+ foreach(l, (List *) transformExprRecurse(pstate, a->rexpr))
{
Node *rexpr = transformExprRecurse(pstate, lfirst(l));
@@ -1224,6 +1247,12 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ if (location)
+ newa->loc_range = list_make2_int(location->location,
+ location->length);
+ else
+ newa->loc_range = NIL;
+
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2014,12 +2043,22 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
Oid array_type, Oid element_type, int32 typmod)
{
ArrayExpr *newa = makeNode(ArrayExpr);
+ List *elements = NIL;
List *newelems = NIL;
List *newcoercedelems = NIL;
ListCell *element;
+ LocationExpr *locExpr = NULL;
Oid coerce_type;
bool coerce_hard;
+ if (IsA(a->elements, LocationExpr))
+ {
+ locExpr = (LocationExpr *) a->elements;
+ elements = (List *) locExpr->expr;
+ }
+ else
+ elements = (List *) a->elements;
+
/*
* Transform the element expressions
*
@@ -2027,7 +2066,7 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
* element expression.
*/
newa->multidims = false;
- foreach(element, a->elements)
+ foreach(element, elements)
{
Node *e = (Node *) lfirst(element);
Node *newe;
@@ -2166,6 +2205,11 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ if (locExpr)
+ newa->loc_range = list_make2_int(locExpr->location,
+ locExpr->length);
+ else
+ newa->loc_range = NIL;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..f3e4ba47af1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -500,10 +500,25 @@ typedef struct A_Indirection
typedef struct A_ArrayExpr
{
NodeTag type;
- List *elements; /* array element expressions */
+ Node *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
} A_ArrayExpr;
+/*
+ * A wrapper expression to record start and end location
+ */
+typedef struct LocationExpr
+{
+ NodeTag type;
+
+ /* the node to be wrapped */
+ Node *expr;
+ /* token location, or -1 if unknown */
+ ParseLoc location;
+ /* token length, or -1 if unknown */
+ ParseLoc length;
+} LocationExpr;
+
/*
* ResTarget -
* result target (used in target list of pre-transformed parse trees)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..60dec576908 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,13 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+
+ /*
+ * Pair (location, length) for list of elements. Note, that location field
+ * cannot always be used here instead, since it could be unknown, e.g. if
+ * the node was created in transformAExprIn.
+ */
+ List *loc_range pg_node_attr(query_jumble_ignore);
} ArrayExpr;
/*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9ea573fae21..1f911b24edd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,6 +1502,7 @@ LOCALLOCK
LOCALLOCKOWNER
LOCALLOCKTAG
LOCALPREDICATELOCK
+LocationExpr
LOCK
LOCKMASK
LOCKMETHODID
--
2.45.1
v3-0003-Use-LocationExpr-in-squashing.patchtext/plain; charset=us-asciiDownload
From b06c08a96d056338f26f53db7a489e47dcc7a4fb Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Thu, 8 May 2025 16:41:20 +0200
Subject: [PATCH v3 3/3] Use LocationExpr in squashing
For the purpose of constants squashing we have only start location of an
expression, which is not enouch if the constant is wrapped e.g. in a
cast function. Apply information conveyed via LocationExpr to improve
squashing of constants.
Based on an idea from Sami Imseih.
---
.../pg_stat_statements/expected/squashing.out | 49 +++++++-----
.../pg_stat_statements/pg_stat_statements.c | 35 +++------
contrib/pg_stat_statements/sql/squashing.sql | 9 ++-
src/backend/nodes/queryjumblefuncs.c | 76 ++++++++++++-------
4 files changed, 99 insertions(+), 70 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 730a52d6917..1aeed911aad 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -147,6 +147,24 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+-- Parsing
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, 2, int4(1));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
@@ -246,7 +264,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -353,7 +371,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -376,7 +394,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -393,10 +411,10 @@ SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,7 +427,8 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
+-- Arrays
+-- Simple array is squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -475,7 +494,7 @@ SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -499,9 +518,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- ($1 /*, ... */)::jsonb +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -526,9 +543,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::int4::casttesttype +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -607,9 +622,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::bigint +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..314e065b364 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,10 +2825,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2886,12 +2882,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
- else if (!in_squashed)
+ else
{
/*
* This location is the start position of a run of constants to be
@@ -2903,27 +2896,12 @@ generate_normalized_query(JumbleState *jstate, const char *query,
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
-
- skipped_constants++;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
+ i + 1 + jstate->highest_extern_param_id);
}
/* Otherwise the constant is squashed away -- move forward */
@@ -3012,6 +2990,13 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
int loc = locs[i].location;
int tok;
+ /* Squashed constants are recorded with a length set already */
+ if (locs[i].squashed)
+ {
+ Assert(locs[i].length != -1);
+ continue;
+ }
+
/* Adjust recorded location if we're dealing with partial string */
loc -= query_loc;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 5ac624ae1f7..6884df1a90d 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -49,6 +49,12 @@ SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- Parsing
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, 2, int4(1));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- FuncExpr
-- Verify multiple type representation end up with the same query_id
@@ -163,7 +169,8 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
+-- Arrays
+-- Simple array is squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..6202e4065e8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -62,8 +62,10 @@ static void AppendJumble(JumbleState *jstate,
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
int location, bool squashed);
+static void RecordConstLocationRange(JumbleState *jstate,
+ int location, int length, bool squashed);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *expr);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -403,6 +405,32 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
}
}
+/*
+ * Similar to RecordConstLocation, RecordConstLocationRange stores a constant
+ * with location start and end boundaries.
+ */
+static void
+RecordConstLocationRange(JumbleState *jstate, int location, int length, bool squashed)
+{
+ /* -1 indicates unknown or undefined location */
+ if (location >= 0 && length >= 0)
+ {
+ /* enlarge array if needed */
+ if (jstate->clocations_count >= jstate->clocations_buf_size)
+ {
+ jstate->clocations_buf_size *= 2;
+ jstate->clocations = (LocationLen *)
+ repalloc(jstate->clocations,
+ jstate->clocations_buf_size *
+ sizeof(LocationLen));
+ }
+ jstate->clocations[jstate->clocations_count].location = location;
+ jstate->clocations[jstate->clocations_count].squashed = squashed;
+ jstate->clocations[jstate->clocations_count].length = length;
+ jstate->clocations_count++;
+ }
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
@@ -461,7 +489,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableConstList(List *elements)
{
ListCell *temp;
@@ -473,13 +501,8 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
return false;
foreach(temp, elements)
- {
if (!IsSquashableConst(lfirst(temp)))
return false;
- }
-
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
return true;
}
@@ -487,7 +510,7 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+ _jumbleElements(jstate, (List *) expr->list, (Node *) expr)
#define JUMBLE_LOCATION(location) \
RecordConstLocation(jstate, expr->location, false)
#define JUMBLE_FIELD(item) \
@@ -523,28 +546,29 @@ do { \
* elements in the list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *expr)
{
- Node *first,
- *last;
-
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableConstList(elements))
{
+ ArrayExpr *array;
+
/*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
+ * Currenlty only ArrayExpr provides location information, needed for
+ * squashing.
*/
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ Assert(IsA(expr, ArrayExpr));
+ array = (ArrayExpr *) expr;
+
+ /*
+ * If the parent ArrayExpr has location information, i.e. start and the
+ * end of the expression, use it as boundaries for squashing.
+ */
+ if (array->loc_range != NIL)
+ RecordConstLocationRange(jstate,
+ linitial_int(array->loc_range),
+ lsecond_int(array->loc_range), true);
+ else
+ _jumbleNode(jstate, (Node *) elements);
}
else
{
--
2.45.1
At the same time AFAICT there isn't much more code paths
to worry about in case of a LocationExpr as a node
I can imagine there are others like value expressions,
row expressions, json array expressions, etc. that we may
want to also normalize.
I believe it's worth to not only to keep amount of work to support
LocationExpr as minimal as possible, but also impact on the existing
code. What I see as a problem is keeping such specific information as
the location boundaries in such a generic expression as A_Expr, where it
will almost never be used. Do I get it right, you folks are ok with
that?
There are other examples of fields that are minimally used in other structs.
Here is one I randomly spotted in SelectStmt such as SortClause, Limit options,
etc. I think the IN list is quite a common case, otherwise we would not
care as much as we do.
There are other examples of fields that are minimally used in other structs.
Here is one I randomly spotted in SelectStmt such as SortClause, Limit options,
etc. I think the IN list is quite a common case, otherwise we would not
care as much as we do. Adding another 8 bytes to the struts does not seem
like that big of a problem to me, especially the structs will remain below
64 bytes.
```
(gdb) ptype /o A_Expr
type = struct A_Expr {
/* 0 | 4 */ NodeTag type;
/* 4 | 4 */ A_Expr_Kind kind;
/* 8 | 8 */ List *name;
/* 16 | 8 */ Node *lexpr;
/* 24 | 8 */ Node *rexpr;
/* 32 | 4 */ ParseLoc location;
/* XXX 4-byte padding */
/* total size (bytes): 40 */
}
(gdb) ptype \o A_ArrayExpr
Invalid character '\' in expression.
(gdb) ptype /o A_ArrayExpr
type = struct A_ArrayExpr {
/* 0 | 4 */ NodeTag type;
/* XXX 4-byte hole */
/* 8 | 8 */ List *elements;
/* 16 | 4 */ ParseLoc location;
/* XXX 4-byte padding */
/* total size (bytes): 24 */
}
```
In general, Making something like T_LocationExpr as a query node
seems totally wrong to me. It's not a node, but rather a temporary
wrapper of some location information and it does not seem it has
business being used by the time we get to thee expression
transformations. It seems very odd considering location information
are simple fields in the parse node itself.
I was a bit worried about not using a Node but Sami has reminded me
last week that we already have in gram.y the concept of using some
private structures to track intermediate results done by the parsing
Attached is a sketch of what I mean. There is a private struct that tracks
the list boundaries and this can wrap in_expr or whatever else makes
sense in the future.
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char
*relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
+%type <listwithboundary> in_expr
The values are then added to start_location/end_location ParseLoc in
A_ArrayExpr and A_Expr. Doing it this will keep changes to the parse_expr.c
code to a minimum, only the IN transformation will need to set the values
of the A_Expr into the final A_ArrayExpr.
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
track_list_boundaries.txttext/plain; charset=US-ASCII; name=track_list_boundaries.txtDownload
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..bec24aab720 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,13 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +531,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,46 +15298,58 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *n = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
-
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- $$ = (Node *) n;
+ SubLink *n2 = (SubLink *) $3;
+
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
+ $$ = (Node *) n2;
}
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "=", $1, n->expr, @2);
+
+ n2->location_start = $3->start;
+ n2->location_end = $3->end;
+ $$ = (Node *) n2;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *n = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n2 = (SubLink *) $4;
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
/* Stick a NOT on top; must have same parse location */
- $$ = makeNotExpr((Node *) n, @2);
+ $$ = makeNotExpr((Node *) n2, @2);
}
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "<>", $1, n->expr, @2);
+
+ n2->location_start = $4->start;
+ n2->location_end = $4->end;
+ $$ = (Node *) n2;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16897,12 +16918,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *n2 = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ n2->expr = (Node *) n;
+ n2->start = -1;
+ n2->end = -1;
+ $$ = n2;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *n = palloc(sizeof(ListWithBoundary));
+
+ n->expr = (Node *) $2;
+ n->start = @1;
+ n->end = @3;
+ $$ = n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..c32cb0673d6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc location_start;
+ ParseLoc location_end;
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc location_start;
+ ParseLoc location_end;
} A_ArrayExpr;
/*
On Tue, May 20, 2025 at 04:50:12PM GMT, Sami Imseih wrote:
At the same time AFAICT there isn't much more code paths
to worry about in case of a LocationExpr as a nodeI can imagine there are others like value expressions,
row expressions, json array expressions, etc. that we may
want to also normalize.
Exactly. When using a node, one can explicitly wrap whatever is needed
into it, while otherwise one would need to find a new way to piggy back
on A_Expr in a new context.
There are other examples of fields that are minimally used in other structs.
Here is one I randomly spotted in SelectStmt such as SortClause, Limit options,
etc.
The way I see it, there is a difference -- I assume those structures
were designed for such cases, where the location range would be just
slapped on top of A_Expr.
Attached is a sketch of what I mean. There is a private struct that tracks
the list boundaries and this can wrap in_expr or whatever else makes
sense in the future.
Just fyi, I don't think this thread is attached to any CF item, meaning
it will not be pulled by the CF bot. In that case feel free to post
diffs in the patch format. I'll take a look at the proposed change, but
a bit later.
At the same time AFAICT there isn't much more code paths
to worry about in case of a LocationExpr as a nodeI can imagine there are others like value expressions,
row expressions, json array expressions, etc. that we may
want to also normalize.
Exactly. When using a node, one can explicitly wrap whatever is needed
into it, while otherwise one would need to find a new way to piggy back
on A_Expr in a new context.
Looking at the VALUES expression case, we will need to carry the info
with SelectStmt and ultimately to RangeTblEntry which is where the
values_list is, so either approach we take RangeTblEntry will need the
LocationExpr pointer or the additional ParseLoc info I am suggesting.
A_Expr is not used in the values list case.
I'll take a look at the proposed change, but a bit later.
Here is a v4 to compare with v3.
0001- is the infrastructure to track the boundaries
0002- the changes to jumbling
0003 - the additional tests introduced in v3
--
Sami
Attachments:
v4-0003-Extend-ARRAY-squashing-tests.patchapplication/x-patch; name=v4-0003-Extend-ARRAY-squashing-tests.patchDownload
From 8c14c0ebb20e79925fdd8b6bbd4fcce91ba92dcf Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v4 3/3] Extend ARRAY squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones.
---
.../pg_stat_statements/expected/squashing.out | 178 ++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 60 ++++++
2 files changed, 238 insertions(+)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index d92cfbd35fb..d628a451a1e 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -429,3 +429,181 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ (SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+ array
+-------------------------------------------------
+ {100,200,300,400,500,600,700,800,900,1000,1100}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
+ abs($8), abs($9), abs($10), ((abs($11))) +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..5ac624ae1f7 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -167,3 +167,63 @@ FROM cte;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+ ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--
2.39.5 (Apple Git-154)
v4-0001-Add-tracking-for-expression-boundaries.patchapplication/x-patch; name=v4-0001-Add-tracking-for-expression-boundaries.patchDownload
From 0af27d235ca6fd1db12d81c657d8b349f3b29316 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-38-230.ec2.internal>
Date: Wed, 21 May 2025 17:25:02 +0000
Subject: [PATCH v4 1/3] Add tracking for expression boundaries
This adds the ability to track the locations of the start and
end of a list of elements such as those in an 'IN' list of an
Array expression to support squashing of values for query
normalization purposes. This corrects various normalization
issues that are a result of 62d712ec.
Discussion: https://www.postgresql.org/message-id/flat/202505021256.4yaa24s3sytm%40alvherre.pgsql#1195a340edca50cc3b7389a2ba8b0467
---
src/backend/parser/gram.y | 94 +++++++++++++++++++++++----------
src/backend/parser/parse_expr.c | 4 ++
src/include/nodes/parsenodes.h | 4 ++
src/include/nodes/primnodes.h | 4 ++
4 files changed, 79 insertions(+), 27 deletions(-)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..0cd5f794db3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,17 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+/*
+ * Track the start and end of a list in an expression, such as an 'IN' list
+ * or Array Expression
+ */
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -184,7 +195,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -269,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +535,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,46 +15302,58 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *n = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
-
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- $$ = (Node *) n;
+ SubLink *n2 = (SubLink *) n->expr;
+
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
+ $$ = (Node *) n2;
}
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "=", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $3->start;
+ n2->rexpr_list_end = $3->end;
+ $$ = (Node *) n2;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *n = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n2 = (SubLink *) n->expr;
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
/* Stick a NOT on top; must have same parse location */
- $$ = makeNotExpr((Node *) n, @2);
+ $$ = makeNotExpr((Node *) n2, @2);
}
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "<>", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $4->start;
+ n2->rexpr_list_end = $4->end;
+ $$ = (Node *) n2;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16764,15 +16789,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16897,12 +16922,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *n2 = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ n2->expr = (Node *) n;
+ n2->start = -1;
+ n2->end = -1;
+ $$ = n2;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *n = palloc(sizeof(ListWithBoundary));
+
+ n->expr = (Node *) $2;
+ n->start = @1;
+ n->end = @3;
+ $$ = n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
@@ -19300,12 +19338,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..7347c989e11 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..2f078887d06 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc rexpr_list_start; /* location of the start of a rexpr list */
+ ParseLoc rexpr_list_end; /* location of the end of a rexpr list */
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc list_start; /* location of the start of the elements list */
+ ParseLoc list_end; /* location of the end of the elements list */
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..773cdd880aa 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,10 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
v4-0002-Support-external-parameters-for-query-squashing.patchapplication/x-patch; name=v4-0002-Support-external-parameters-for-query-squashing.patchDownload
From 6f7f7c2abb9e3cd2b5654869ee3626e6fe6549c5 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-38-230.ec2.internal>
Date: Wed, 21 May 2025 18:55:52 +0000
Subject: [PATCH v4 2/3] Support external parameters for query squashing
62d712ec introduced the concept of element squashing for
quwry normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
Discussion: https://www.postgresql.org/message-id/flat/202505021256.4yaa24s3sytm%40alvherre.pgsql#1195a340edca50cc3b7389a2ba8b0467
---
.../pg_stat_statements/expected/squashing.out | 14 +-
.../pg_stat_statements/pg_stat_statements.c | 84 +++---------
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 121 +++++++++++-------
4 files changed, 100 insertions(+), 121 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..d92cfbd35fb 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -246,7 +246,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -353,7 +353,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -376,7 +376,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -393,10 +393,10 @@ SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..efcad87d684 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2825,10 +2825,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2842,9 +2838,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2857,7 +2850,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
tok_len; /* Length (in bytes) of that tok */
off = jstate->clocations[i].location;
-
/* Adjust recorded location if we're dealing with partial string */
off -= query_loc;
@@ -2866,67 +2858,24 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
- /*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
- */
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
-
- skipped_constants++;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ /*
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment in place of
+ * the second and remaining values of the list.
+ */
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ i + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
- /* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
last_off = off;
last_tok_len = tok_len;
@@ -3017,6 +2966,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..17ba3696226 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1324,7 +1324,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..32bc42bffca 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static uint64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -381,7 +381,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +396,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +419,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -437,22 +443,45 @@ IsSquashableConst(Node *element)
{
Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
+ switch (nodeTag(arg))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
+
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
}
- return true;
+ return false;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
/*
* Subroutine for _jumbleElements: Verify whether the provided list
- * can be squashed, meaning it contains only constant expressions.
+ * can be squashed, meaning it contains only constant and external
+ * parameter expressions.
*
* Return value indicates if squashing is possible.
*
@@ -461,7 +490,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +503,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +543,37 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow for Constants and Params of type external to be squashed. To
+ * be able to normalize such queries by stripping away the squashed away
+ * values, we must track the start and end of the expression list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
--
2.39.5 (Apple Git-154)
On Wed, May 21, 2025 at 08:22:19PM GMT, Sami Imseih wrote:
At the same time AFAICT there isn't much more code paths
to worry about in case of a LocationExpr as a nodeI can imagine there are others like value expressions,
row expressions, json array expressions, etc. that we may
want to also normalize.Exactly. When using a node, one can explicitly wrap whatever is needed
into it, while otherwise one would need to find a new way to piggy back
on A_Expr in a new context.Looking at the VALUES expression case, we will need to carry the info
with SelectStmt and ultimately to RangeTblEntry which is where the
values_list is, so either approach we take RangeTblEntry will need the
LocationExpr pointer or the additional ParseLoc info I am suggesting.
A_Expr is not used in the values list case.
Right, that's precisely my point -- introducing a new node will allow to
to use the same generalized mechanism in such scenarios as well, instead
of every time inventing something new.
I'll take a look at the proposed change, but a bit later.
Here is a v4 to compare with v3.
0001- is the infrastructure to track the boundaries
0002- the changes to jumbling
Just to call this out, I don't think there is an agreement on squashing
Params, which you have added into 0002. Let's discuss this change
separately from the 18 open item.
---
Here is a short summary of the open item:
* An issue has been discovered with the squashing feature in 18, which
can lead to invalid normalized queries in pg_stat_statement.
* The proposed fix extends gram.y functionality capturing
the end location for list expressions to address that.
* There is a disagreement on how exactly to capture the location, the
options are introducing a new node LocationExpr or piggy back on an
existing A_Expr. I find the former more flexible and less invasive,
but looks like there are also other opinions.
Now, both flavour of the proposed solution could be still concidered too
invasive to be applied as a bug fix. I personally don't see it like
this, but I'm obviously biased. This leads us to following decisions to
be made:
* Is modifying parser (either adding a new node or modifying an existing
one) acceptable at this stage? I guess it would be enough to collect
couple of votes yes/no in this thread.
* If it's not acceptable, the feature could be reverted in 18, and the
fix could be applied to the master branch only.
I'm fine with both outcomes (apply the fix to both 18 and master, or
revert in 18 and apply the fix on master), and leave the decision to
�lvaro (sorry for causing all the troubles). It's fair to say that
reverting the feature will be the least risky move.
On 2025-May-22, Dmitry Dolgov wrote:
Just to call this out, I don't think there is an agreement on squashing
Params, which you have added into 0002.
Actually I think we do have agreement on squashing PARAM_EXTERN Params.
/messages/by-id/3086744.1746500983@sss.pgh.pa.us
Now, both flavour of the proposed solution could be still concidered too
invasive to be applied as a bug fix. I personally don't see it like
this, but I'm obviously biased. This leads us to following decisions to
be made:* Is modifying parser (either adding a new node or modifying an existing
one) acceptable at this stage? I guess it would be enough to collect
couple of votes yes/no in this thread.
IMO adding a struct as suggested is okay, especially if it reduces the
overall code complexity. But we don't want a node, just a bare struct.
Adding a node would be more troublesome.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
IMO adding a struct as suggested is okay, especially if it reduces the
overall code complexity. But we don't want a node, just a bare struct.
Adding a node would be more troublesome.
In v4, a new private struct is added in gram.y, but we are also adding
additional fields to track the expression boundaries to the required
nodes.
--
Sami
On Thu, May 22, 2025 at 03:10:34PM -0500, Sami Imseih wrote:
IMO adding a struct as suggested is okay, especially if it reduces the
overall code complexity. But we don't want a node, just a bare struct.
Adding a node would be more troublesome.In v4, a new private struct is added in gram.y, but we are also adding
additional fields to track the expression boundaries to the required
nodes.
Handling external parameters as something that gets squashed is also
the consensus I am understanding we've reached. I'm OK with it.
Upthread, scenarios with multiple IN lists was mentioned to be broken:
/messages/by-id/CAA5RZ0ts6zb-efiJ+K31Z_YDU=M7tHE43vv6ZBCqQxiABr3Yaw@mail.gmail.com
For example with bind queries like that:
select where $1 in ($3, $2) and 1 in ($4, cast($5 as int))
\bind 0 1 2 3 4
Should we have a bit more coverage, where we use multiple IN and/or
ARRAY lists with constants and/or external parameters?
v4-0003 with the extra tests for ARRAY can be applied first, with the
test output slightly adjusted and the casts showing up. Now, looking
independently at v4-0001, it is a bit hard to say what's the direct
benefit of this patch, because nothing in the tests of pgss change
after applying it. Could the benefit of this patch be demonstrated so
as it is possible to compare what's the current vs the what-would-be
new behavior?
The patterns generated when using casts is still a bit funny, but
perhaps nobody will bother much about the result generated as these
are uncommon. For example, this gets squashed, with the end of the
cast included:
Q: select where 2 in (1, 4) and 1 in (5, (cast(7 as int)), 6, cast(8 as int));
R: select where $1 in ($2 /*, ... */) and $3 in ($4 /*, ... */ as int))
This does not get squashed:
Q: select where 2 in (1, 4) and
1 in (5, cast(7 as int), 6, (cast(8 as int)), 9, 10, (cast(8 as text))::int);
R: select where $1 in ($2 /*, ... */) and
$3 in ($4, cast($5 as int), $6, (cast($7 as int)), $8, $9, (cast($10 as text))::int)
This is the kind of stuff that should also have coverage for, IMO, or
we will never keep track of what the existing behavior is, and if
things break in some way in the future.
FWIW, with v4-0002 applied, I am seeing one diff in the dml tests,
where a IN list is not squashed for pgss_dml_tab.
The squashing tests point to more issues in the v4 series:
- Some lists are not getting squashed anymore.
- Some spacing issues, like "( $5)::jsonb".
Am I missing something?
--
Michael
For example with bind queries like that:
select where $1 in ($3, $2) and 1 in ($4, cast($5 as int))
\bind 0 1 2 3 4Should we have a bit more coverage, where we use multiple IN and/or
ARRAY lists with constants and/or external parameters?
I will add more test coverage. All the tests we have for constants
should also have a external parameter counterpart.
v4-0003 with the extra tests for ARRAY can be applied first, with the
test output slightly adjusted and the casts showing up.
That was my mistake in rearranging the v3-0001 as v4-0003. I will
fix in the next revision.
Now, looking
independently at v4-0001, it is a bit hard to say what's the direct
benefit of this patch, because nothing in the tests of pgss change
after applying it. Could the benefit of this patch be demonstrated so
as it is possible to compare what's the current vs the what-would-be
new behavior?
You're right, this should not be an independent patch. I had intended to
eventually merge these v4-0001 and v4-0002 but felt it was cleaner to
review separately. I'll just combine them in the next rev.
The patterns generated when using casts is still a bit funny, but
perhaps nobody will bother much about the result generated as these
are uncommon. For example, this gets squashed, with the end of the
cast included:
Q: select where 2 in (1, 4) and 1 in (5, (cast(7 as int)), 6, cast(8 as int));
R: select where $1 in ($2 /*, ... */) and $3 in ($4 /*, ... */ as int))This does not get squashed:
Q: select where 2 in (1, 4) and
1 in (5, cast(7 as int), 6, (cast(8 as int)), 9, 10, (cast(8 as text))::int);
R: select where $1 in ($2 /*, ... */) and
$3 in ($4, cast($5 as int), $6, (cast($7 as int)), $8, $9, (cast($10 as text))::int)This is the kind of stuff that should also have coverage for, IMO, or
we will never keep track of what the existing behavior is, and if
things break in some way in the future.
This is interesting actually. This is the behavior on HEAD, and I don't get why
the first list with the casts does not get squashed, while the second one does.
I will check IsSquashableConst tomorrow unless Dmitry gets to it first.
```
test=# select where 2 in (1, 4) and 1 in (5, cast(7 as int), 6,
(cast(8 as int)), 9, 10, (cast(8 as text))::int);
--
(0 rows)
test=# select where 1 in (5, cast(7 as int), 6);
--
(0 rows)
test=# select queryid, substr(query, 1, 100) as query from pg_stat_statements;
queryid |
query
----------------------+-----------------------------------------------------------------------------------
-------------------
2125518472894925252 | select where $1 in ($2 /*, ... */) and $3 in
($4, cast($5 as int), $6, (cast($7 as
int)), $8, $9, (c
-4436613157077978160 | select where $1 in ($2 /*, ... */)
```
FWIW, with v4-0002 applied, I am seeing one diff in the dml tests,
where a IN list is not squashed for pgss_dml_tab.
hmm, I did not observe the same diff.
--
Sami
On Thu, May 22, 2025 at 10:23:31PM GMT, Sami Imseih wrote:
This does not get squashed:
Q: select where 2 in (1, 4) and
1 in (5, cast(7 as int), 6, (cast(8 as int)), 9, 10, (cast(8 as text))::int);
R: select where $1 in ($2 /*, ... */) and
$3 in ($4, cast($5 as int), $6, (cast($7 as int)), $8, $9, (cast($10 as text))::int)This is interesting actually. This is the behavior on HEAD, and I don't get why
the first list with the casts does not get squashed, while the second one does.
I will check IsSquashableConst tomorrow unless Dmitry gets to it first.
IsSquashableConst has intentionally a limited set of test for
"constantness", in particular it does not recurse. The case above
(cast(8 as text))::int
features two CoerceViaIO expressions one inside another, hence
IsSquashableConst returns false.
On Thu, May 22, 2025 at 10:23:31PM GMT, Sami Imseih wrote:
This does not get squashed:
Q: select where 2 in (1, 4) and
1 in (5, cast(7 as int), 6, (cast(8 as int)), 9, 10, (cast(8 as text))::int);
R: select where $1 in ($2 /*, ... */) and
$3 in ($4, cast($5 as int), $6, (cast($7 as int)), $8, $9, (cast($10 as text))::int)This is interesting actually. This is the behavior on HEAD, and I don't get why
the first list with the casts does not get squashed, while the second one does.
I will check IsSquashableConst tomorrow unless Dmitry gets to it first.IsSquashableConst has intentionally a limited set of test for
"constantness", in particular it does not recurse. The case above(cast(8 as text))::int
features two CoerceViaIO expressions one inside another, hence
IsSquashableConst returns false.
Should we be doing something like this? to unwrap RelabelType or
CoerceViaIO until we have a different type of node to check
for later on. We can guard the loop and break out after x amount
of times as well. At minimum, we should try to unwrap at least
2 times for some of the common real-world scenarios.
What do you think?
```
while (IsA(element, RelabelType) || IsA(element, CoerceViaIO))
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
else if (IsA(element, CoerceViaIO))
element = (Node *) ((CoerceViaIO *) element)->arg;
}
```
--
Sami
On Fri, May 23, 2025 at 09:05:54AM GMT, Sami Imseih wrote:
On Thu, May 22, 2025 at 10:23:31PM GMT, Sami Imseih wrote:
This does not get squashed:
Q: select where 2 in (1, 4) and
1 in (5, cast(7 as int), 6, (cast(8 as int)), 9, 10, (cast(8 as text))::int);
R: select where $1 in ($2 /*, ... */) and
$3 in ($4, cast($5 as int), $6, (cast($7 as int)), $8, $9, (cast($10 as text))::int)This is interesting actually. This is the behavior on HEAD, and I don't get why
the first list with the casts does not get squashed, while the second one does.
I will check IsSquashableConst tomorrow unless Dmitry gets to it first.IsSquashableConst has intentionally a limited set of test for
"constantness", in particular it does not recurse. The case above(cast(8 as text))::int
features two CoerceViaIO expressions one inside another, hence
IsSquashableConst returns false.Should we be doing something like this? to unwrap RelabelType or
CoerceViaIO until we have a different type of node to check
for later on. We can guard the loop and break out after x amount
of times as well. At minimum, we should try to unwrap at least
2 times for some of the common real-world scenarios.What do you think?
```
while (IsA(element, RelabelType) || IsA(element, CoerceViaIO))
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
else if (IsA(element, CoerceViaIO))
element = (Node *) ((CoerceViaIO *) element)->arg;
}
```
I think it's better to recursively call IsSquashableConst on the nested
expression (arg or args for FuncExpr). Something like that was done in
the original patch version and was concidered too much at that time, but
since it looks like all the past concerns are lifted, why not. Do not
forget check_stack_depth.
On Fri, May 23, 2025 at 04:29:45PM +0200, Dmitry Dolgov wrote:
I think it's better to recursively call IsSquashableConst on the nested
expression (arg or args for FuncExpr). Something like that was done in
the original patch version and was concidered too much at that time, but
since it looks like all the past concerns are lifted, why not. Do not
forget check_stack_depth.
AFAIK, we have already a couple of check_stack_depth() calls during
some node transformations after-parsing. At this level of the code
that would be a new thing..
--
Michael
On Fri, May 23, 2025 at 04:29:45PM +0200, Dmitry Dolgov wrote:
I think it's better to recursively call IsSquashableConst on the nested
expression (arg or args for FuncExpr). Something like that was done in
the original patch version and was concidered too much at that time, but
since it looks like all the past concerns are lifted, why not. Do not
forget check_stack_depth.AFAIK, we have already a couple of check_stack_depth() calls during
some node transformations after-parsing. At this level of the code
that would be a new thing..
I think the recursion will simplify the logic inside
IsSquashableConstants. I will
probably add that as a separate patch that maybe will get applied to HEAD only.
Something I want agreement on is the following.
Since we assign new parameter symbols based on the highest external param
from the original query, as stated in the docs [0]https://www.postgresql.org/docs/current/pgstatstatements.html "The parameter
symbols used to replace
constants in representative query texts start from the next number after the
highest $n parameter in the original query text", we could have gaps
in assigning
symbol values, such as the case below.
```
test=# select where 1 in ($1, $2, $3) and 1 = $4
test-# \bind 1 2 3 4
test-# ;
--
(0 rows)
test=# select query from pg_stat_statements;
query
------------------------------------------------
select where $5 in ($6 /*, ... */) and $7 = $4
```
I don't think there is much we can do here, without introducing some serious
complexity. I think the docs make this scenario clear.
Thoughts?
[0]: https://www.postgresql.org/docs/current/pgstatstatements.html
--
Sami
On Fri, May 23, 2025 at 08:05:47PM -0500, Sami Imseih wrote:
Since we assign new parameter symbols based on the highest external param
from the original query, as stated in the docs [0] "The parameter
symbols used to replace
constants in representative query texts start from the next number after the
highest $n parameter in the original query text", we could have gaps
in assigning
symbol values, such as the case below.```
test=# select where 1 in ($1, $2, $3) and 1 = $4
test-# \bind 1 2 3 4
test-# ;
--
(0 rows)test=# select query from pg_stat_statements;
query
------------------------------------------------
select where $5 in ($6 /*, ... */) and $7 = $4
```I don't think there is much we can do here, without introducing some serious
complexity. I think the docs make this scenario clear.
In v17, we are a bit smarter with the numbering, with a normalization
giving the following, starting at $1:
select where $5 in ($1, $2, $3) and $6 = $4
So your argument about the $n parameters is kind of true, but I think
the numbering logic in v17 to start at $1 is a less-confusing result.
I would imagine that the squashed logic should give the following
result on HEAD in this case if we want a maximum of consistency with
the squashing of the IN elements taken into account:
select where $3 in ($1 /*, ... */) and $4 = $2
Starting the count of the parameters at $4 would be strange.
--
Michael
In v17, we are a bit smarter with the numbering, with a normalization
giving the following, starting at $1:
select where $5 in ($1, $2, $3) and $6 = $4So your argument about the $n parameters is kind of true, but I think
the numbering logic in v17 to start at $1 is a less-confusing result.
I would imagine that the squashed logic should give the following
result on HEAD in this case if we want a maximum of consistency with
the squashing of the IN elements taken into account:
select where $3 in ($1 /*, ... */) and $4 = $2Starting the count of the parameters at $4 would be strange.
yeah, I think the correct answer is we need to handle 2 cases.
1. If we don't have a squashed list, then we just do what we do now.
2. If we have 1 or more squashed lists, then we can't guarantee
the $n parameter as was supplied by the user and we simply rename
the $n starting from 1.
therefore, a user supplied query like this:
```
select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
```
will be normalized to:
```
select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
```
To accomplish this, we will need to track the locations of
external parameters to support the 2nd case, because we need
to re-write the original location of the parameter with
the new value. I played around with this this morning and
it works as I described above. Any concerns with the
behavior described above?
--
Sami
On Sat, May 24, 2025 at 09:35:24AM -0500, Sami Imseih wrote:
2. If we have 1 or more squashed lists, then we can't guarantee
the $n parameter as was supplied by the user and we simply rename
the $n starting from 1.therefore, a user supplied query like this:
```
select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
```will be normalized to:
```
select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
```To accomplish this, we will need to track the locations of
external parameters to support the 2nd case, because we need
to re-write the original location of the parameter with
the new value. I played around with this this morning and
it works as I described above. Any concerns with the
behavior described above?
That would be OK by me. Not having gaps in the parameter numbers of
the normalized query just feels just like the natural thing to have in
the data reported by PGSS.
--
Michael
On 2025-May-24, Sami Imseih wrote:
therefore, a user supplied query like this:
```
select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
```will be normalized to:
```
select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
```
Hmm, interesting.
I think this renumbering should not be a problem in practice; users with
unordered parameters have little room to complain if the param numbers
change on query normalization. At least that's how it seems to me.
If renumbering everything in physical order makes the code simpler, then
I don't disagree.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
therefore, a user supplied query like this:
```
select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
```will be normalized to:
```
select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
```Hmm, interesting.
I think this renumbering should not be a problem in practice; users with
unordered parameters have little room to complain if the param numbers
change on query normalization. At least that's how it seems to me.If renumbering everything in physical order makes the code simpler, then
I don't disagree.
It does make it simpler, otherwise we have to introduce O(n) behavior
to find eligible parameter numbers.
I've spent a bit of time looking at this, and I want to
propose the following patchset.
* 0001:
This is a normalization issue discovered when adding new
tests for squashing. This is also an issue that exists in
v17 and likely earlier versions and should probably be
backpatched.
The crux of the problem is if a constant location is
recorded multiple times, the values for $n don't take
into account the duplicate constant locations and end up
incorrectly incrementing the next value fro $n.
So, a query like
SELECT WHERE '1' IN ('2'::int, '3'::int::text)
ends up normalizing to
SELECT WHERE $1 IN ($3::int, $4::int::text)
I also added a few test cases as part of
this patch.
This does also feel like it should be backpatched.
* 0002:
Added some more tests to the ones initially proposed
by Dmitri in v3-0001 [0]/messages/by-id/i635eozw2yjpzqxi5vgm4ceccqq3gv7ul4xj2xni2v6pfgtqlr@vc5otquxmgjg including the "edge cases" which
led to the findings for 0001.
* 0003:
This fixes the normalization anomalies introduced by
62d712ec ( squashing feature ) mentioned here [1]/messages/by-id/CAA5RZ0ts6zb-efiJ+K31Z_YDU=M7tHE43vv6ZBCqQxiABr3Yaw@mail.gmail.com
This patch therefore implements the fixes to track
the boundaries of an IN-list, Array expression.
* 0004: implements external parameter squashing.
While I think we should get all patches in for v18, I definitely
think we need to get the first 3 because they fix existing
bugs.
What do you think?
[0]: /messages/by-id/i635eozw2yjpzqxi5vgm4ceccqq3gv7ul4xj2xni2v6pfgtqlr@vc5otquxmgjg
[1]: /messages/by-id/CAA5RZ0ts6zb-efiJ+K31Z_YDU=M7tHE43vv6ZBCqQxiABr3Yaw@mail.gmail.com
--
Sami
Attachments:
v5-0002-Enhanced-query-jumbling-squashing-tests.patchapplication/octet-stream; name=v5-0002-Enhanced-query-jumbling-squashing-tests.patchDownload
From b57bedc05eaca18d0846d7e3313e857ff8c5fc9a Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v5 2/4] Enhanced query jumbling squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones. Also, enhance tests for the
negative cases of RelabelType, CoerceViaIO and FuncExpr. While at it,
re-organized some parts of the tests and correct minor spacing issues.
---
.../pg_stat_statements/expected/squashing.out | 331 ++++++++++++++++--
contrib/pg_stat_statements/sql/squashing.sql | 113 +++++-
2 files changed, 408 insertions(+), 36 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..725238d3f5c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -273,32 +273,22 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
- id | data
-----+------
-(0 rows)
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+--
+(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
- (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
- (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
- (SELECT $10)::jsonb) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- CoerceViaIO
@@ -357,6 +347,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
-- Some casting expression are simplified to Const
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -366,8 +357,8 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
id | data
----+------
(0 rows)
@@ -380,25 +371,81 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- RelabelType
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::text::int::text::int, $3::text::int::text::int) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- RelabelType
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- if there is only one level of relabeltype, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+ id | data
+----+------
+(0 rows)
+
+-- if there is at least one element with multiple levels of relabeltype,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1 /*, ... */::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
-- Test constants evaluation in a CTE, which was causing issues in the past
WITH cte AS (
SELECT 'const' as const FROM test_squash
@@ -429,3 +476,235 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ($1 /*, ... */)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::int4::casttesttype +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ (SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+ array
+-------------------------------------------------
+ {100,200,300,400,500,600,700,800,900,1000,1100}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
+ abs($8), abs($9), abs($10), ((abs($11))) +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::bigint +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- edge cases
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+--
+(1 row)
+
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2::int, $3::int::text) | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+--
+(1 row)
+
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int::text) | 2
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..0aaa893eb1a 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -87,14 +87,9 @@ SELECT * FROM test_squash_bigint WHERE id IN
abs(800), abs(900), abs(1000), ((abs(1100))));
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- CoerceViaIO
@@ -143,17 +138,39 @@ SELECT * FROM test_squash_cast WHERE data IN
10::int4::casttesttype, 11::int4::casttesttype);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
+
-- Some casting expression are simplified to Const
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- RelabelType
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of relabeltype, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is at least one element with multiple levels of relabeltype,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -167,3 +184,79 @@ FROM cte;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- edge cases
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--
2.39.5 (Apple Git-154)
v5-0003-Fix-Normalization-for-squashed-query-texts.patchapplication/octet-stream; name=v5-0003-Fix-Normalization-for-squashed-query-texts.patchDownload
From 3cf06adf2bc6cd02234313b34909060f2849da6f Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:11:46 -0500
Subject: [PATCH v5 3/4] Fix Normalization for squashed query texts
62d712ec added the ability to squash constants from an
IN list/ArrayExpr for queryId computation purposes. However,
in certain cases, this broke normalization. For example,
"IN (1, 2, int4(1))" is normalized to "IN ($2 /*, ... */))",
which leaves an extra parenthesis at the end of the normalized string.
To correct this, the start and end boundaries of an expr_list are
now tracked by the various nodes used during parsing and are made
available to the ArrayExpr node for query jumbling. Having these
boundaries allows normalization to precisely identify the locations
in the query text that should be squashed.
---
.../pg_stat_statements/expected/squashing.out | 44 +++++----
.../pg_stat_statements/pg_stat_statements.c | 76 ++++-----------
contrib/pg_stat_statements/sql/squashing.sql | 5 +
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 84 +++++++++--------
src/backend/parser/gram.y | 94 +++++++++++++------
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 4 +
src/include/nodes/primnodes.h | 4 +
9 files changed, 174 insertions(+), 143 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 725238d3f5c..f3f212183a2 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -82,6 +82,24 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- built-in functions will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -246,7 +264,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -343,7 +361,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -367,7 +385,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -441,7 +459,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
--------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */::oid) |
+ ($1 /*, ... */) |
SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
@@ -522,7 +540,7 @@ SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -546,9 +564,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- ($1 /*, ... */)::jsonb +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -573,9 +589,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::int4::casttesttype +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -654,9 +668,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::bigint +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -681,7 +693,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2 /*, ... */) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -705,6 +717,6 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int::text) | 2
+ select where $1 IN ($2 /*, ... */) | 2
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index c58f34e9f30..8cadfa2ff21 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2817,7 +2817,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
int num_constants_replaced = 0;
/*
@@ -2832,9 +2831,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2856,63 +2852,22 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment from the
+ * second value of the list.
*/
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
-
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
/* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
@@ -3005,6 +2960,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 0aaa893eb1a..aed4e42286c 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -26,6 +26,11 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data =
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- built-in functions will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..17ba3696226 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1324,7 +1324,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..219023b1173 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static uint64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -381,7 +381,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +396,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +419,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -450,6 +456,7 @@ IsSquashableConst(Node *element)
return true;
}
+
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -461,7 +468,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +481,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +521,36 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow constants to be squashed. To normalize such queries, we use
+ * the start and end locations of the list of elements in a list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..0cd5f794db3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,17 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+/*
+ * Track the start and end of a list in an expression, such as an 'IN' list
+ * or Array Expression
+ */
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -184,7 +195,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -269,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +535,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,46 +15302,58 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *n = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
-
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- $$ = (Node *) n;
+ SubLink *n2 = (SubLink *) n->expr;
+
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
+ $$ = (Node *) n2;
}
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "=", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $3->start;
+ n2->rexpr_list_end = $3->end;
+ $$ = (Node *) n2;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *n = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n2 = (SubLink *) n->expr;
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
/* Stick a NOT on top; must have same parse location */
- $$ = makeNotExpr((Node *) n, @2);
+ $$ = makeNotExpr((Node *) n2, @2);
}
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "<>", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $4->start;
+ n2->rexpr_list_end = $4->end;
+ $$ = (Node *) n2;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16764,15 +16789,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16897,12 +16922,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *n2 = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ n2->expr = (Node *) n;
+ n2->start = -1;
+ n2->end = -1;
+ $$ = n2;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *n = palloc(sizeof(ListWithBoundary));
+
+ n->expr = (Node *) $2;
+ n->start = @1;
+ n->end = @3;
+ $$ = n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
@@ -19300,12 +19338,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..7347c989e11 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..2f078887d06 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc rexpr_list_start; /* location of the start of a rexpr list */
+ ParseLoc rexpr_list_end; /* location of the end of a rexpr list */
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc list_start; /* location of the start of the elements list */
+ ParseLoc list_end; /* location of the end of the elements list */
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..773cdd880aa 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,10 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
v5-0001-Fix-off-by-one-error-in-query-normalization.patchapplication/octet-stream; name=v5-0001-Fix-off-by-one-error-in-query-normalization.patchDownload
From 48bb42245c0fe3c846512b1ced30ba210b8d4617 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 21:41:17 -0500
Subject: [PATCH v5 1/4] Fix off-by-one error in query normalization
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In situations where a location was recorded more than
once during query jumbling, query normalization would
still account for the duplicate location—which should
have been skipped—when generating the $n values. This
led to a situation where gaps in the $n values would
appear in the final normalized string. For example:
select where '1' IN ('1'::int, '2'::int::text)
would be normalized to:
select where $1 IN ($3, $4)
instead of the correct:
select where $1 IN ($2, $3)
---
contrib/pg_stat_statements/pg_stat_statements.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index d8fdf42df79..c58f34e9f30 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2818,9 +2818,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
+ int num_constants_replaced = 0;
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2878,7 +2876,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* In case previous constants were merged away, stop doing that */
in_squashed = false;
@@ -2902,12 +2900,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* The next location will match the block below, to end the run */
in_squashed = true;
-
- skipped_constants++;
}
else
{
--
2.39.5 (Apple Git-154)
v5-0004-Support-Squashing-of-External-Parameters.patchapplication/octet-stream; name=v5-0004-Support-Squashing-of-External-Parameters.patchDownload
From 6e3e08753a57402ba09ef393c873af7992cca0fe Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:17:24 -0500
Subject: [PATCH v5 4/4] Support Squashing of External Parameters
---
.../pg_stat_statements/expected/squashing.out | 34 +++++++-
.../pg_stat_statements/pg_stat_statements.c | 7 ++
contrib/pg_stat_statements/sql/squashing.sql | 10 +++
src/backend/nodes/queryjumblefuncs.c | 84 ++++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 3 +
6 files changed, 108 insertions(+), 36 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f3f212183a2..94c5d365d68 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -21,10 +21,16 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3);
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3) \bind 1 2 3
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 2
SELECT * FROM test_squash WHERE id IN ($1) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
@@ -44,10 +50,17 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 4
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 6
SELECT * FROM test_squash WHERE id IN ($1) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
@@ -75,12 +88,20 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND da
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12
+ \bind 1 2 3 4 5 6 7 8 9 10 11 2
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
---------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+(3 rows)
-- built-in functions will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -93,10 +114,15 @@ SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
--
(1 row)
+SELECT WHERE 1 IN ($1, $2, int4($3::int), int4($4::int)) \bind 1 2 1 2
+;
+--
+(1 row)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8cadfa2ff21..69d69db2289 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2834,6 +2834,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Allocate result buffer */
norm_query = palloc(norm_query_buflen + 1);
@@ -2842,6 +2845,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index aed4e42286c..3b451a0c414 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -11,11 +11,16 @@ CREATE TABLE test_squash (id int, data int);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1);
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3) \bind 1 2 3
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- More conditions in the query
@@ -24,11 +29,16 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12
+ \bind 1 2 3 4 5 6 7 8 9 10 11 2
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- built-in functions will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+SELECT WHERE 1 IN ($1, $2, int4($3::int), int4($4::int)) \bind 1 2 1 2
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Multiple squashed intervals
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 219023b1173..c13598e6757 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,7 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordExpressionLocation(JumbleState *jstate,
- int location, int len);
+ int location, int len, bool extern_param);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
@@ -70,6 +70,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +186,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -381,7 +383,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordExpressionLocation(JumbleState *jstate, int location, int len)
+RecordExpressionLocation(JumbleState *jstate, int location, int len, bool extern_param)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -405,6 +407,7 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
*/
jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
@@ -443,17 +446,39 @@ IsSquashableExpression(Node *element)
{
Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
+ switch (nodeTag(arg))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
+
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
}
- return true;
+ return false;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
@@ -493,7 +518,7 @@ IsSquashableExpressionList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordExpressionLocation(jstate, expr->location, -1)
+ RecordExpressionLocation(jstate, expr->location, -1, false)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -544,8 +569,9 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
RecordExpressionLocation(jstate,
aexpr->list_start + 1,
- (aexpr->list_end - aexpr->list_start) - 1);
+ (aexpr->list_end - aexpr->list_start) - 1, false);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -597,26 +623,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -719,3 +725,21 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 773cdd880aa..99d2c019c4b 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e..a1a99023e42 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,7 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+ bool extern_param;
} LocationLen;
/*
@@ -62,6 +63,8 @@ typedef struct JumbleState
*/
unsigned int pending_nulls;
+ bool has_squashed_lists;
+
#ifdef USE_ASSERT_CHECKING
/* The total number of bytes added to the jumble buffer */
Size total_jumble_len;
--
2.39.5 (Apple Git-154)
I've spent a bit of time looking at this, and I want to
propose the following patchset.
Sorry about this, but I missed to add a comment in one of the
test cases for 0004 that describes the behavior of parameters
and constants that live outside of the squashed list.
The following 2 cases will result in different queryId's because
the 4th constant/parameter will be jumbled either as a type Const
or type Param.
select from tab where a in (1, 2, 3) and b = 4
select from tab where a in ($1, $2, $3) and b = $4
--
Sami
Attachments:
v6-0004-Support-Squashing-of-External-Parameters.patchapplication/octet-stream; name=v6-0004-Support-Squashing-of-External-Parameters.patchDownload
From 1f28a748f6e47b625a9e902215b4ccc9fe474dda Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:17:24 -0500
Subject: [PATCH v6 4/4] Support Squashing of External Parameters
62d712ec introduced the concept of element squashing for
quwry normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
Discussion: https://www.postgresql.org/message-id/flat/202505021256.4yaa24s3sytm%40alvherre.pgsql#1195a340edca50cc3b7389a2ba8b0467
---
.../pg_stat_statements/expected/squashing.out | 36 +++++++-
.../pg_stat_statements/pg_stat_statements.c | 7 ++
contrib/pg_stat_statements/sql/squashing.sql | 12 +++
src/backend/nodes/queryjumblefuncs.c | 84 ++++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 3 +
6 files changed, 112 insertions(+), 36 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f3f212183a2..739c8888b3c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -21,10 +21,16 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3);
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3) \bind 1 2 3
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 2
SELECT * FROM test_squash WHERE id IN ($1) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
@@ -44,10 +50,17 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 4
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 6
SELECT * FROM test_squash WHERE id IN ($1) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
@@ -70,17 +83,27 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data =
----+------
(0 rows)
+-- external parameters and constants outside of a squashed list will have
+-- different node types and result in a different queryId
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
id | data
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12
+ \bind 1 2 3 4 5 6 7 8 9 10 11 2
+;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
---------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+(3 rows)
-- built-in functions will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -93,10 +116,15 @@ SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
--
(1 row)
+SELECT WHERE 1 IN ($1, $2, int4($3::int), int4($4::int)) \bind 1 2 1 2
+;
+--
+(1 row)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8cadfa2ff21..69d69db2289 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2834,6 +2834,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Allocate result buffer */
norm_query = palloc(norm_query_buflen + 1);
@@ -2842,6 +2845,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index aed4e42286c..1df11e5a220 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -11,11 +11,16 @@ CREATE TABLE test_squash (id int, data int);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1);
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3) \bind 1 2 3
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
+ \bind 1 2 3 4 5 6 7 8 9 10 11
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- More conditions in the query
@@ -23,12 +28,19 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data = 2;
+-- external parameters and constants outside of a squashed list will have
+-- different node types and result in a different queryId
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12
+ \bind 1 2 3 4 5 6 7 8 9 10 11 2
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- built-in functions will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+SELECT WHERE 1 IN ($1, $2, int4($3::int), int4($4::int)) \bind 1 2 1 2
+;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Multiple squashed intervals
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 219023b1173..c13598e6757 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,7 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordExpressionLocation(JumbleState *jstate,
- int location, int len);
+ int location, int len, bool extern_param);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
@@ -70,6 +70,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +186,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -381,7 +383,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordExpressionLocation(JumbleState *jstate, int location, int len)
+RecordExpressionLocation(JumbleState *jstate, int location, int len, bool extern_param)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -405,6 +407,7 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
*/
jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
@@ -443,17 +446,39 @@ IsSquashableExpression(Node *element)
{
Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
+ switch (nodeTag(arg))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
+
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
}
- return true;
+ return false;
}
- if (!IsA(element, Const))
- return false;
+ switch (nodeTag(element))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ {
+ Param *param = (Param *) element;
- return true;
+ return param->paramkind == PARAM_EXTERN;
+ }
+ default:
+ break;
+ }
+
+ return false;
}
@@ -493,7 +518,7 @@ IsSquashableExpressionList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordExpressionLocation(jstate, expr->location, -1)
+ RecordExpressionLocation(jstate, expr->location, -1, false)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -544,8 +569,9 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
RecordExpressionLocation(jstate,
aexpr->list_start + 1,
- (aexpr->list_end - aexpr->list_start) - 1);
+ (aexpr->list_end - aexpr->list_start) - 1, false);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -597,26 +623,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -719,3 +725,21 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 773cdd880aa..99d2c019c4b 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e..a1a99023e42 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,7 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+ bool extern_param;
} LocationLen;
/*
@@ -62,6 +63,8 @@ typedef struct JumbleState
*/
unsigned int pending_nulls;
+ bool has_squashed_lists;
+
#ifdef USE_ASSERT_CHECKING
/* The total number of bytes added to the jumble buffer */
Size total_jumble_len;
--
2.39.5 (Apple Git-154)
v6-0002-Enhanced-query-jumbling-squashing-tests.patchapplication/octet-stream; name=v6-0002-Enhanced-query-jumbling-squashing-tests.patchDownload
From 205584b640623c5ae42470aa4ff11ca0c8067288 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v6 2/4] Enhanced query jumbling squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones. Also, enhance tests for the
negative cases of RelabelType, CoerceViaIO and FuncExpr. While at it,
re-organized some parts of the tests and correct minor spacing issues.
---
.../pg_stat_statements/expected/squashing.out | 331 ++++++++++++++++--
contrib/pg_stat_statements/sql/squashing.sql | 113 +++++-
2 files changed, 408 insertions(+), 36 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..725238d3f5c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -273,32 +273,22 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
- id | data
-----+------
-(0 rows)
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+--
+(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
- (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
- (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
- (SELECT $10)::jsonb) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- CoerceViaIO
@@ -357,6 +347,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
-- Some casting expression are simplified to Const
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -366,8 +357,8 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
id | data
----+------
(0 rows)
@@ -380,25 +371,81 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- RelabelType
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::text::int::text::int, $3::text::int::text::int) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- RelabelType
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- if there is only one level of relabeltype, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+ id | data
+----+------
+(0 rows)
+
+-- if there is at least one element with multiple levels of relabeltype,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1 /*, ... */::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
-- Test constants evaluation in a CTE, which was causing issues in the past
WITH cte AS (
SELECT 'const' as const FROM test_squash
@@ -429,3 +476,235 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ($1 /*, ... */)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::int4::casttesttype +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+ array
+------------------------------------------------------------------------------------
+ {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ (SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+ array
+-------------------------------------------------
+ {100,200,300,400,500,600,700,800,900,1000,1100}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
+ abs($8), abs($9), abs($10), ((abs($11))) +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+ array
+---------------------------
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ $1 /*, ... */::bigint +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- edge cases
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+--
+(1 row)
+
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2::int, $3::int::text) | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+--
+(1 row)
+
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int::text) | 2
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..0aaa893eb1a 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -87,14 +87,9 @@ SELECT * FROM test_squash_bigint WHERE id IN
abs(800), abs(900), abs(1000), ((abs(1100))));
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- CoerceViaIO
@@ -143,17 +138,39 @@ SELECT * FROM test_squash_cast WHERE data IN
10::int4::casttesttype, 11::int4::casttesttype);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
+
-- Some casting expression are simplified to Const
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- RelabelType
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of relabeltype, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is at least one element with multiple levels of relabeltype,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -167,3 +184,79 @@ FROM cte;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ ('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CoerceViaIO, SubLink instead of a Const is not squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ (SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Bigint, long tokens with parenthesis
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint
+];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- edge cases
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--
2.39.5 (Apple Git-154)
v6-0003-Fix-Normalization-for-squashed-query-texts.patchapplication/octet-stream; name=v6-0003-Fix-Normalization-for-squashed-query-texts.patchDownload
From 074619b0658d1160e7c2110b67288f47118063bb Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:11:46 -0500
Subject: [PATCH v6 3/4] Fix Normalization for squashed query texts
62d712ec added the ability to squash constants from an
IN list/ArrayExpr for queryId computation purposes. However,
in certain cases, this broke normalization. For example,
"IN (1, 2, int4(1))" is normalized to "IN ($2 /*, ... */))",
which leaves an extra parenthesis at the end of the normalized string.
To correct this, the start and end boundaries of an expr_list are
now tracked by the various nodes used during parsing and are made
available to the ArrayExpr node for query jumbling. Having these
boundaries allows normalization to precisely identify the locations
in the query text that should be squashed.
---
.../pg_stat_statements/expected/squashing.out | 44 +++++----
.../pg_stat_statements/pg_stat_statements.c | 76 ++++-----------
contrib/pg_stat_statements/sql/squashing.sql | 5 +
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 84 +++++++++--------
src/backend/parser/gram.y | 94 +++++++++++++------
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 4 +
src/include/nodes/primnodes.h | 4 +
9 files changed, 174 insertions(+), 143 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 725238d3f5c..f3f212183a2 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -82,6 +82,24 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- built-in functions will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -246,7 +264,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 1
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -343,7 +361,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 1
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -367,7 +385,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -441,7 +459,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
--------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */::oid) |
+ ($1 /*, ... */) |
SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
@@ -522,7 +540,7 @@ SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -546,9 +564,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- ($1 /*, ... */)::jsonb +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -573,9 +589,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::int4::casttesttype +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -654,9 +668,7 @@ SELECT ARRAY[
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[ +| 1
- $1 /*, ... */::bigint +|
- ] |
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -681,7 +693,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2 /*, ... */) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -705,6 +717,6 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int::text) | 2
+ select where $1 IN ($2 /*, ... */) | 2
(2 rows)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index c58f34e9f30..8cadfa2ff21 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2817,7 +2817,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
int num_constants_replaced = 0;
/*
@@ -2832,9 +2831,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2856,63 +2852,22 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment from the
+ * second value of the list.
*/
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
-
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
/* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
@@ -3005,6 +2960,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 0aaa893eb1a..aed4e42286c 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -26,6 +26,11 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data =
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- built-in functions will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, 2, int4(1), int4(2));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..17ba3696226 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1324,7 +1324,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..219023b1173 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static uint64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -381,7 +381,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +396,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +419,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -450,6 +456,7 @@ IsSquashableConst(Node *element)
return true;
}
+
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -461,7 +468,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +481,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +521,36 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow constants to be squashed. To normalize such queries, we use
+ * the start and end locations of the list of elements in a list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..0cd5f794db3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,17 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+/*
+ * Track the start and end of a list in an expression, such as an 'IN' list
+ * or Array Expression
+ */
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -184,7 +195,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -269,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +535,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,46 +15302,58 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *n = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
-
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- $$ = (Node *) n;
+ SubLink *n2 = (SubLink *) n->expr;
+
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
+ $$ = (Node *) n2;
}
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "=", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $3->start;
+ n2->rexpr_list_end = $3->end;
+ $$ = (Node *) n2;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *n = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(n->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n2 = (SubLink *) n->expr;
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
+ n2->subLinkType = ANY_SUBLINK;
+ n2->subLinkId = 0;
+ n2->testexpr = $1;
+ n2->operName = NIL; /* show it's IN not = ANY */
+ n2->location = @2;
/* Stick a NOT on top; must have same parse location */
- $$ = makeNotExpr((Node *) n, @2);
+ $$ = makeNotExpr((Node *) n2, @2);
}
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n2 = makeSimpleA_Expr(AEXPR_IN, "<>", $1, n->expr, @2);
+
+ n2->rexpr_list_start = $4->start;
+ n2->rexpr_list_end = $4->end;
+ $$ = (Node *) n2;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16764,15 +16789,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16897,12 +16922,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *n2 = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ n2->expr = (Node *) n;
+ n2->start = -1;
+ n2->end = -1;
+ $$ = n2;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *n = palloc(sizeof(ListWithBoundary));
+
+ n->expr = (Node *) $2;
+ n->start = @1;
+ n->end = @3;
+ $$ = n;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
@@ -19300,12 +19338,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..7347c989e11 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..2f078887d06 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc rexpr_list_start; /* location of the start of a rexpr list */
+ ParseLoc rexpr_list_end; /* location of the end of a rexpr list */
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc list_start; /* location of the start of the elements list */
+ ParseLoc list_end; /* location of the end of the elements list */
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..773cdd880aa 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,10 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
v6-0001-Fix-broken-normalization-due-to-duplicate-constan.patchapplication/octet-stream; name=v6-0001-Fix-broken-normalization-due-to-duplicate-constan.patchDownload
From 9f2541b4ddc41f0efbdcd1ef79454da778179f0b Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 21:41:17 -0500
Subject: [PATCH v6 1/4] Fix broken normalization due to duplicate constant
locations
pg_stat_statements anticipates that certain constant
locations may be recorded multiple times and attempts
to avoid calculating a length for these locations in
fill_in_constant_lengths.
However, during generate_normalized_query, these
locations are not excluded from consideration and
will increment the $n counter for every recorded
occurrence of such a location. In practice, this can
lead to incorrect normalization in certain cases.
select where '1' IN ('2'::int, '3'::int::text)
would be normalized to:
select where $1 IN ($3, $4)
instead of the correct:
select where $1 IN ($2, $3)
This is because the left-expression, '1' is used
as an argument in the OpExpr generated for every
element in the IN clause.
To correct, track the number of constants replaced
with an $n by a separate counter instead of the
iterator used to loop through the list of locations.
---
contrib/pg_stat_statements/pg_stat_statements.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index d8fdf42df79..c58f34e9f30 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2818,9 +2818,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
+ int num_constants_replaced = 0;
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2878,7 +2876,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* In case previous constants were merged away, stop doing that */
in_squashed = false;
@@ -2902,12 +2900,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* The next location will match the block below, to end the run */
in_squashed = true;
-
- skipped_constants++;
}
else
{
--
2.39.5 (Apple Git-154)
On Tue, May 27, 2025 at 05:05:39PM -0500, Sami Imseih wrote:
* 0001:
This is a normalization issue discovered when adding new
tests for squashing. This is also an issue that exists in
v17 and likely earlier versions and should probably be
backpatched.The crux of the problem is if a constant location is
recorded multiple times, the values for $n don't take
into account the duplicate constant locations and end up
incorrectly incrementing the next value from $n.This does also feel like it should be backpatched.
Yes, this needs to be backpatched and it is actually a safe backpatch
because only the text representation is changed when adding a new
entry in the dshash; it only improves the reports without touching the
existing data. I'm OK to take care of this one by myself, even in the
context of this thread. It is an issue independent of what we're
discussing here for the list squashing. As there is only one sprintf() in
generate_normalized_query() in ~17, the fix of the back-branches is
slightly simpler.
You have mentioned the addition of tests, but v6-0001 includes nothing
of the kind. Am I missing something? How much coverage did you
intend to add here? These seem to be included in squashing.sql in
patch v6-0002, but IMO this should be moved somewhere else to work
with the back-branches and make the whole backpatch story more
consistent.
* 0002:
Added some more tests to the ones initially proposed
by Dmitri in v3-0001 [0] including the "edge cases" which
led to the findings for 0001.
Tests for CoerceViaIO with jsonb have been moved around. Not a big
deal, but that makes the diffs of the patch confusing to read.
+-- if there is only one level of relabeltype, the list will be squashable
RelabelType perhaps?
A lot of the tests introduced in v6-0002 are copy-pastes of the
previous ones for IN clauses introduced for the ARRAY cases, with
comments explaining the reasons why lists are squashed or not also
copy-pasted. Perhaps it would make sense to group the ARRAY and IN
clause cases together. For example, group each of the two CoerceViaIO
cases together in a single query on pg_stat_statements, with a single
pg_stat_statements_reset(). That would make more difficult to miss
the fact that we need to care about IN clauses *and* arrays when
adding more test patterns, if we add some of course.
The cases where IN clauses are rewritten as ArrayExpr are OK kept at
the end.
* 0003:
This fixes the normalization anomalies introduced by
62d712ec ( squashing feature ) mentioned here [1]This patch therefore implements the fixes to track
the boundaries of an IN-list, Array expression.
Nice simplifications in the PGSS part in terms of
+ ListWithBoundary *n = $4;
I'd suggest to not use "n" for this one, but a different variable
name, leaving the internals for the SubLink cases minimally touched.
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
Implementation-wise, I would choose a location with a query length
rather than start and end locations. That's what we do for the nested
queries in the DMLs, so on consistency grounds..
* 0004: implements external parameter squashing.
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
[...]
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
Using a custom implementation for Param nodes means that we are going
to apply a location record for all external parameters, not only the
ones in the lists.. Not sure if this is a good idea. Something
smells a bit wrong with this approach. Sorry, I cannot push my finger
on what exactly when typing this paragraph.
While I think we should get all patches in for v18, I definitely
think we need to get the first 3 because they fix existing
bugs.What do you think?
Patches 0002 and 0003 fix bugs in the squashing logic present only on
HEAD, nothing that impacts older branches already released, right?
--
Michael
* 0001:
You have mentioned the addition of tests, but v6-0001 includes nothing
of the kind. Am I missing something? How much coverage did you
intend to add here? These seem to be included in squashing.sql in
patch v6-0002, but IMO this should be moved somewhere else to work
with the back-branches and make the whole backpatch story more
consistent.
That's my mistake. I added a new file called normalize.sql to test
specific normalization scenarios. Added in v7
* 0002:
RelabelType perhaps?
Fixed.
A lot of the tests introduced in v6-0002 are copy-pastes of the
previous ones for IN clauses introduced for the ARRAY cases, with
comments explaining the reasons why lists are squashed or not also
copy-pasted. Perhaps it would make sense to group the ARRAY and IN
clause cases together. For example, group each of the two CoerceViaIO
cases together in a single query on pg_stat_statements, with a single
pg_stat_statements_reset(). That would make more difficult to miss
the fact that we need to care about IN clauses *and* arrays when
adding more test patterns, if we add some of course.
I agree. I reorganized by grouping both for IN and ARRAY tests
together for a specific test area.
I also clarified some comments in the tests, etc.
* 0003:
I'd suggest to not use "n" for this one, but a different variable
name, leaving the internals for the SubLink cases minimally touched.
I agree. Fixed.
Implementation-wise, I would choose a location with a query length
rather than start and end locations. That's what we do for the nested
queries in the DMLs, so on consistency grounds..
This is different because the existing location field is tracking
something a bit different than what we want to track.
What the current location field is tracking is to assist in things
like error messages, like below, which wants to place the
caret (^) in the proper location, which is at the location of the
"IN".
```
ERROR: operator does not exist: oid = text
LINE 1: select where 1::oid IN (1::text, 2, 3);
^
HINT: No operator matches the given name and argument types. You
might need to add explicit type casts.
test=#
```
What we need for squashing is to track the start of the outer '(' and ')' of
the expression.
I could do something like fields to track list_start and list_length instead,
Will that be better to be closer in consistency?
* 0004: implements external parameter squashing.
Using a custom implementation for Param nodes means that we are going
to apply a location record for all external parameters, not only the
ones in the lists.. Not sure if this is a good idea. Something
smells a bit wrong with this approach. Sorry, I cannot push my finger
on what exactly when typing this paragraph.
Actually, only the parameters outside of the squashed lists are
recorded. I added
a comment to make that clear. I would really want to only record parameter
locations if we know we have a squashed list, but it's impossible to
know that in
advance.
Also, the reason for a custom implementation for Param is to avoid having
to change the signature of JUMBLE_LOCATION because we have a
new bool argument to RecordExpressionLocation to set a location as an
external parameter. We will also need special handling in gen_node_support.pl
for Param to set the new argument. I was not too happy with doing that.
Patches 0002 and 0003 fix bugs in the squashing logic present only on
HEAD, nothing that impacts older branches already released, right?
That is correct.
--
Sami
Attachments:
v7-0003-Fix-Normalization-for-squashed-query-texts.patchapplication/octet-stream; name=v7-0003-Fix-Normalization-for-squashed-query-texts.patchDownload
From 76beecdada07c35040cd1966b6fea41e9d40bbc7 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:11:46 -0500
Subject: [PATCH v7 3/4] Fix Normalization for squashed query texts
62d712ec added the ability to squash constants from an
IN list/ArrayExpr for queryId computation purposes. However,
in certain cases, this broke normalization. For example,
"IN (1, 2, int4(1))" is normalized to "IN ($2 /*, ... */))",
which leaves an extra parenthesis at the end of the normalized string.
To correct this, the start and end boundaries of an expr_list are
now tracked by the various nodes used during parsing and are made
available to the ArrayExpr node for query jumbling. Having these
boundaries allows normalization to precisely identify the locations
in the query text that should be squashed.
---
.../pg_stat_statements/expected/normalize.out | 23 +++++
.../pg_stat_statements/expected/squashing.out | 14 ++--
.../pg_stat_statements/pg_stat_statements.c | 76 ++++-------------
contrib/pg_stat_statements/sql/normalize.sql | 7 ++
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 84 ++++++++++---------
src/backend/parser/gram.y | 68 +++++++++++----
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 4 +
src/include/nodes/primnodes.h | 4 +
10 files changed, 165 insertions(+), 121 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/normalize.out b/contrib/pg_stat_statements/expected/normalize.out
index 1e94dbb9b43..2944cff5997 100644
--- a/contrib/pg_stat_statements/expected/normalize.out
+++ b/contrib/pg_stat_statements/expected/normalize.out
@@ -25,3 +25,26 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index b8724e3356c..5376700fef8 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -453,7 +453,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 2
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -572,7 +572,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 2
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -604,7 +604,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 2
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -704,9 +704,9 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
--------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */::oid) |
+ ($1 /*, ... */) |
SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(4 rows)
@@ -773,7 +773,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2 /*, ... */) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -797,7 +797,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int::text) | 2
+ select where $1 IN ($2 /*, ... */) | 2
(2 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index c58f34e9f30..8cadfa2ff21 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2817,7 +2817,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
int num_constants_replaced = 0;
/*
@@ -2832,9 +2831,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2856,63 +2852,22 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment from the
+ * second value of the list.
*/
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
-
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
/* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
@@ -3005,6 +2960,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/contrib/pg_stat_statements/sql/normalize.sql b/contrib/pg_stat_statements/sql/normalize.sql
index 1252c9bc53d..b8421aa3379 100644
--- a/contrib/pg_stat_statements/sql/normalize.sql
+++ b/contrib/pg_stat_statements/sql/normalize.sql
@@ -7,4 +7,11 @@
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE '1' IN ('1'::int, '3'::int::text);
SELECT WHERE (1, 2) IN ((1, 2), (2, 3));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
\ No newline at end of file
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 77659b0f760..17ba3696226 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1324,7 +1324,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d1e82a63f09..219023b1173 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static uint64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -381,7 +381,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +396,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +419,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -450,6 +456,7 @@ IsSquashableConst(Node *element)
return true;
}
+
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -461,7 +468,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +481,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +521,36 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow constants to be squashed. To normalize such queries, we use
+ * the start and end locations of the list of elements in a list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..e6f0581fdc9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,17 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+/*
+ * Track the start and end of a list in an expression, such as an 'IN' list
+ * or Array Expression
+ */
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -184,7 +195,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -269,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +535,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,11 +15302,13 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *l = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(l->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
+ SubLink *n = (SubLink *) l->expr;
n->subLinkType = ANY_SUBLINK;
n->subLinkId = 0;
@@ -15305,17 +15320,23 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "=", $1, l->expr, @2);
+
+ n->rexpr_list_start = $3->start;
+ n->rexpr_list_end = $3->end;
+ $$ = (Node *) n;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *l = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(l->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n = (SubLink *) l->expr;
n->subLinkType = ANY_SUBLINK;
n->subLinkId = 0;
@@ -15328,7 +15349,11 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "<>", $1, l->expr, @2);
+
+ n->rexpr_list_start = $4->start;
+ n->rexpr_list_end = $4->end;
+ $$ = (Node *) n;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16764,15 +16789,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16897,12 +16922,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *l = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ l->expr = (Node *) n;
+ l->start = -1;
+ l->end = -1;
+ $$ = l;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *l = palloc(sizeof(ListWithBoundary));
+
+ l->expr = (Node *) $2;
+ l->start = @1;
+ l->end = @3;
+ $$ = l;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
@@ -19300,12 +19338,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..7347c989e11 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293..2f078887d06 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -347,6 +347,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc rexpr_list_start; /* location of the start of a rexpr list */
+ ParseLoc rexpr_list_end; /* location of the end of a rexpr list */
} A_Expr;
/*
@@ -502,6 +504,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc list_start; /* location of the start of the elements list */
+ ParseLoc list_end; /* location of the end of the elements list */
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..773cdd880aa 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,10 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
v7-0001-Fix-broken-normalization-due-to-duplicate-constan.patchapplication/octet-stream; name=v7-0001-Fix-broken-normalization-due-to-duplicate-constan.patchDownload
From 4e80ed16ba0a6554052246bd19f5509b13fb1514 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 21:41:17 -0500
Subject: [PATCH v7 1/4] Fix broken normalization due to duplicate constant
locations
pg_stat_statements anticipates that certain constant
locations may be recorded multiple times and attempts
to avoid calculating a length for these locations in
fill_in_constant_lengths.
However, during generate_normalized_query, these
locations are not excluded from consideration and
will increment the $n counter for every recorded
occurrence of such a location. In practice, this can
lead to incorrect normalization in certain cases.
select where '1' IN ('2'::int, '3'::int::text)
would be normalized to:
select where $1 IN ($3, $4)
instead of the correct:
select where $1 IN ($2, $3)
This is because the left-expression, '1' is used
as an argument in the OpExpr generated for every
element in the IN clause.
To correct, track the number of constants replaced
with an $n by a separate counter instead of the
iterator used to loop through the list of locations.
---
contrib/pg_stat_statements/Makefile | 2 +-
.../pg_stat_statements/expected/normalize.out | 27 +++++++++++++++++++
contrib/pg_stat_statements/meson.build | 1 +
.../pg_stat_statements/pg_stat_statements.c | 10 +++----
contrib/pg_stat_statements/sql/normalize.sql | 10 +++++++
5 files changed, 42 insertions(+), 8 deletions(-)
create mode 100644 contrib/pg_stat_statements/expected/normalize.out
create mode 100644 contrib/pg_stat_statements/sql/normalize.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index b2bd8794d2a..f08280bdcf7 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
REGRESS = select dml cursors utility level_tracking planning \
user_activity wal entry_timestamp privileges extended \
- parallel cleanup oldextversions squashing
+ parallel cleanup oldextversions squashing normalize
# Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
# which typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/normalize.out b/contrib/pg_stat_statements/expected/normalize.out
new file mode 100644
index 00000000000..1e94dbb9b43
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/normalize.out
@@ -0,0 +1,27 @@
+--
+-- Validate normalization of constants
+--
+-- Ensure that there are no gaps in the generated $n parameters. The following
+-- queries will record some constant location one or more times.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE '1' IN ('1'::int, '3'::int::text);
+--
+(1 row)
+
+SELECT WHERE (1, 2) IN ((1, 2), (2, 3));
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int, $3::int::text) | 1
+ SELECT WHERE ($1, $2) IN (($3, $4), ($5, $6)) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 01a6cbdcf61..931a5b29427 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -57,6 +57,7 @@ tests += {
'cleanup',
'oldextversions',
'squashing',
+ 'normalize',
],
'regress_args': ['--temp-config', files('pg_stat_statements.conf')],
# Disabled because these tests require
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index d8fdf42df79..c58f34e9f30 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2818,9 +2818,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
bool in_squashed = false; /* in a run of squashed consts? */
- int skipped_constants = 0; /* Position adjustment of later
- * constants after squashed ones */
-
+ int num_constants_replaced = 0;
/*
* Get constants' lengths (core system only gives us locations). Note
@@ -2878,7 +2876,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then a param symbol replacing the constant itself */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* In case previous constants were merged away, stop doing that */
in_squashed = false;
@@ -2902,12 +2900,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
/* ... and then start a run of squashed constants */
n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- i + 1 + jstate->highest_extern_param_id - skipped_constants);
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
/* The next location will match the block below, to end the run */
in_squashed = true;
-
- skipped_constants++;
}
else
{
diff --git a/contrib/pg_stat_statements/sql/normalize.sql b/contrib/pg_stat_statements/sql/normalize.sql
new file mode 100644
index 00000000000..1252c9bc53d
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/normalize.sql
@@ -0,0 +1,10 @@
+--
+-- Validate normalization of constants
+--
+
+-- Ensure that there are no gaps in the generated $n parameters. The following
+-- queries will record some constant location one or more times.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE '1' IN ('1'::int, '3'::int::text);
+SELECT WHERE (1, 2) IN ((1, 2), (2, 3));
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
\ No newline at end of file
--
2.39.5 (Apple Git-154)
v7-0002-Enhanced-query-jumbling-squashing-tests.patchapplication/octet-stream; name=v7-0002-Enhanced-query-jumbling-squashing-tests.patchDownload
From 86b316be782008db815557a3d0aa7f555c820b9e Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v7 2/4] Enhanced query jumbling squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones. Also, enhance tests for the
negative cases of RelabelType, CoerceViaIO and FuncExpr. While at it,
re-organized some parts of the tests and correct minor spacing issues.
---
.../pg_stat_statements/expected/squashing.out | 528 +++++++++++++++---
contrib/pg_stat_statements/sql/squashing.sql | 186 +++++-
2 files changed, 613 insertions(+), 101 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..b8724e3356c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -2,9 +2,11 @@
-- Const squashing functionality
--
CREATE EXTENSION pg_stat_statements;
+--
+--Simple Lists
+--
CREATE TABLE test_squash (id int, data int);
--- IN queries
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -16,42 +18,149 @@ SELECT * FROM test_squash WHERE id IN (1);
----+------
(0 rows)
+SELECT ARRAY[1];
+ array
+-------
+ {1}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT ARRAY[$1] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
id | data
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1, 2, 3];
+ array
+---------
+ {1,2,3}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4];
+ array
+-----------
+ {1,2,3,4}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4, 5];
+ array
+-------------
+ {1,2,3,4,5}
+(1 row)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
- SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 3
+ SELECT ARRAY[$1 /*, ... */] | 3
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- external parameters will not be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
id | data
----+------
(0 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
id | data
----+------
(0 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
id | data
----+------
(0 rows)
+DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 4
- SELECT * FROM test_squash WHERE id IN ($1) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
-(4 rows)
+ query | calls
+------------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- More conditions in the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -75,10 +184,25 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND da
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
---------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 6
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -107,24 +231,46 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 6
AND data IN ($2 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
id | data
@@ -137,19 +283,35 @@ SELECT * FROM test_squash WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
($1 + $2, $3 + $4, $5 + $6, $7 + $8, $9 + $10, $11 + $12, $13 + $14, $15 + $16, $17 + $18) |
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
(@ $1, @ $2, @ $3, @ $4, @ $5, @ $6, @ $7, @ $8, @ $9) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -181,12 +343,38 @@ SELECT data FROM test_float WHERE data IN (1.0, 1.0);
------
(0 rows)
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
+ data
+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------+-------
- SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 5
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 3
+ SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 7
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
@@ -201,12 +389,18 @@ SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
----+------
(0 rows)
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------------+-------
- SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
@@ -221,14 +415,20 @@ SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -242,15 +442,22 @@ SELECT * FROM test_squash_bigint WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN +| 1
+ SELECT * FROM test_squash_bigint WHERE data IN +| 2
($1 /*, ... */::bigint) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -264,44 +471,47 @@ SELECT * FROM test_squash_bigint WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE id IN +| 1
+ SELECT * FROM test_squash_bigint WHERE id IN +| 2
(abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
abs($8), abs($9), abs($10), ((abs($11)))) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
- id | data
-----+------
-(0 rows)
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
+--
+(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
- (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
- (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
- (SELECT $10)::jsonb) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
CREATE FUNCTION casttesttype_in(cstring)
@@ -349,15 +559,25 @@ SELECT * FROM test_squash_cast WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_cast WHERE data IN +| 1
+ SELECT * FROM test_squash_cast WHERE data IN +| 2
($1 /*, ... */::int4::casttesttype) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -366,8 +586,16 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
id | data
----+------
(0 rows)
@@ -375,28 +603,144 @@ SELECT * FROM test_squash_jsonb WHERE data IN
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
(($1 /*, ... */)::jsonb) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
+ ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::text::int::text::int, $3::text::int::text::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+--
-- RelabelType
+--
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1 /*, ... */::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(4 rows)
+
+--
+-- edge cases
+--
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,23 +753,59 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
- array
-------------------------
- {1,2,3,4,5,6,7,8,9,10}
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+--
+(1 row)
+
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+--
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2::int, $3::int::text) | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+--
+(1 row)
+
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int::text) | 2
(2 rows)
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..85aae152da8 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -3,101 +3,160 @@
--
CREATE EXTENSION pg_stat_statements;
-CREATE TABLE test_squash (id int, data int);
+--
+--Simple Lists
+--
--- IN queries
+CREATE TABLE test_squash (id int, data int);
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1);
+SELECT ARRAY[1];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+SELECT ARRAY[1, 2, 3];
+SELECT ARRAY[1, 2, 3, 4];
+SELECT ARRAY[1, 2, 3, 4, 5];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- More conditions in the query
+-- external parameters will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- More conditions in the query
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT data FROM test_float WHERE data IN (1, 2);
SELECT data FROM test_float WHERE data IN (1, '2');
SELECT data FROM test_float WHERE data IN ('1', 2);
SELECT data FROM test_float WHERE data IN ('1', '2');
SELECT data FROM test_float WHERE data IN (1.0, 1.0);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN
(1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE id IN
(abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
abs(800), abs(900), abs(1000), ((abs(1100))));
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
@@ -141,19 +200,73 @@ SELECT * FROM test_squash_cast WHERE data IN
4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
10::int4::casttesttype, 11::int4::casttesttype);
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
-- RelabelType
+--
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- edge cases
+--
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -163,7 +276,26 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
\ No newline at end of file
--
2.39.5 (Apple Git-154)
v7-0004-Support-Squashing-of-External-Parameters.patchapplication/octet-stream; name=v7-0004-Support-Squashing-of-External-Parameters.patchDownload
From 3822a9ee6f882cf5ed51dcbbef87742debd88160 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:17:24 -0500
Subject: [PATCH v7 4/4] Support Squashing of External Parameters
62d712ec introduced the concept of element squashing for
quwry normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
Discussion: https://www.postgresql.org/message-id/flat/202505021256.4yaa24s3sytm%40alvherre.pgsql#1195a340edca50cc3b7389a2ba8b0467
---
.../pg_stat_statements/expected/squashing.out | 24 ++--
.../pg_stat_statements/pg_stat_statements.c | 7 +
contrib/pg_stat_statements/sql/squashing.sql | 4 +-
src/backend/nodes/queryjumblefuncs.c | 126 +++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 3 +
6 files changed, 109 insertions(+), 61 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 5376700fef8..ad282ac2b83 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -103,7 +103,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -123,14 +123,14 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
- SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -155,11 +155,11 @@ EXECUTE p1(1, 2, 3, 4, 5);
DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- DEALLOCATE $1 | 2
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 2
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- More conditions in the query
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8cadfa2ff21..69d69db2289 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2834,6 +2834,9 @@ generate_normalized_query(JumbleState *jstate, const char *query,
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Allocate result buffer */
norm_query = palloc(norm_query_buflen + 1);
@@ -2842,6 +2845,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 85aae152da8..4efd412be9b 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -32,7 +32,7 @@ SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
;
@@ -40,7 +40,7 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
PREPARE p1(int, int, int, int, int) AS
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 219023b1173..fdd6ef38f08 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,7 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordExpressionLocation(JumbleState *jstate,
- int location, int len);
+ int location, int len, bool extern_param);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
@@ -70,6 +70,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +186,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -381,7 +383,7 @@ FlushPendingNulls(JumbleState *jstate)
* element contributes nothing to the jumble hash.
*/
static void
-RecordExpressionLocation(JumbleState *jstate, int location, int len)
+RecordExpressionLocation(JumbleState *jstate, int location, int len, bool extern_param)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -405,10 +407,29 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
*/
jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
+/*
+ * Subroutine for IsSquashableExpression to check if a node is a
+ * constant or External Parameter.
+ */
+static bool
+IsConstOrExternalParam(Node *node)
+{
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ return ((Param *) node)->paramkind == PARAM_EXTERN;
+ default:
+ return false;
+ }
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
@@ -421,42 +442,48 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
static bool
IsSquashableExpression(Node *element)
{
+ ListCell *temp;
+
+ /* Unwrap RelabelType and CoerceViaIO layers */
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
if (IsA(element, CoerceViaIO))
element = (Node *) ((CoerceViaIO *) element)->arg;
- if (IsA(element, FuncExpr))
+ switch (nodeTag(element))
{
- FuncExpr *func = (FuncExpr *) element;
- ListCell *temp;
+ case T_FuncExpr:
+ {
+ FuncExpr *func = (FuncExpr *) element;
- if (func->funcformat != COERCE_IMPLICIT_CAST &&
- func->funcformat != COERCE_EXPLICIT_CAST)
- return false;
+ /*
+ * Only implicit/explicit casts on built-in functions are
+ * squashable
+ */
+ if (func->funcformat != COERCE_IMPLICIT_CAST &&
+ func->funcformat != COERCE_EXPLICIT_CAST)
+ return false;
- if (func->funcid > FirstGenbkiObjectId)
- return false;
+ if (func->funcid > FirstGenbkiObjectId)
+ return false;
- foreach(temp, func->args)
- {
- Node *arg = lfirst(temp);
+ /* All arguments must be constants or external parameters */
+ foreach(temp, func->args)
+ {
+ Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
- }
+ if (IsConstOrExternalParam(arg))
+ return true;
+ }
- return true;
+ return false;
+ }
+ default:
+ return IsConstOrExternalParam(element);
}
-
- if (!IsA(element, Const))
- return false;
-
- return true;
}
-
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -493,7 +520,7 @@ IsSquashableExpressionList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordExpressionLocation(jstate, expr->location, -1)
+ RecordExpressionLocation(jstate, expr->location, -1, false)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -544,8 +571,9 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
RecordExpressionLocation(jstate,
aexpr->list_start + 1,
- (aexpr->list_end - aexpr->list_start) - 1);
+ (aexpr->list_end - aexpr->list_start) - 1, false);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -597,26 +625,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -719,3 +727,31 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+/*
+ * Custom query jumble function for _jumbleParam.
+ *
+ * Only external parameter locations outside of squashable lists are
+ * handled.
+ */
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ /*
+ * Update the highest Param id seen, in order to start normalization
+ * correctly.
+ */
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 773cdd880aa..99d2c019c4b 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e..a1a99023e42 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,7 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+ bool extern_param;
} LocationLen;
/*
@@ -62,6 +63,8 @@ typedef struct JumbleState
*/
unsigned int pending_nulls;
+ bool has_squashed_lists;
+
#ifdef USE_ASSERT_CHECKING
/* The total number of bytes added to the jumble buffer */
Size total_jumble_len;
--
2.39.5 (Apple Git-154)
On Wed, May 28, 2025 at 04:05:03PM -0500, Sami Imseih wrote:
That's my mistake. I added a new file called normalize.sql to test
specific normalization scenarios. Added in v7
Thanks. I was not sure that a new file was worth having for these
tests, knowing that select.sql has similar coverage. I have grouped
the new tests into select.sql at the end, and added a few more
scenarios for extended queries with \bindin extended.sql, where I've
reproduced the same issue while playing with external parameters.
Applied the result down to v13, down to where the problem exists for
supported branches.
I still need to review the rest of the patch series..
--
Michael
On Thu, May 29, 2025 at 11:30:12AM +0900, Michael Paquier wrote:
I still need to review the rest of the patch series..
The test additions done in v7-0002 look sensible here.
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
In v7-0002, this comment is removed, but it still applies, isn't it?
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
Interesting distinction that explains the differences in counts. Yes
it's a good idea to track this kind of behavior in the tests.
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
Seems incorrect with 0002 taken in isolation. The last cast is still
present in the normalization. It's not after v7-0003.
Already mentioned upthread, but applying only v7-0003 on top of
v7-0002 (not v7-0004) leads to various regression failures in dml.sql
and squashing.sql. The failures persist with v7-0004 applied. Please
see these as per the attached, the IN lists do not get squashed, the
array elements are. Just to make sure that I am not missing
something, I've rebuilt from scratch with no success.
IsSquashableExpressionList() includes this comment, which is outdated,
probably because squashing was originally optional behind a GUC and
the parameter has been removed while the comment has not been
refreshed:
/*
* If squashing is disabled, or the list is too short, we don't try to
* squash it.
*/
RecordExpressionLocation()'s top comment needs a refresh, talking
about constants. The simplifications gained in pgss.c's normalization
are pretty cool.
+ bool has_squashed_lists;
[...]
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
This new flag in JumbleState needs to be documented, explaining why
it needs to be here. I have to admit that it is strange to see
highest_extern_param_id, one value in JumbleState be forced to zero in
the PGSS normalization code if has_squashed_lists is set to true.
This seems like a layer violation to me: JumbleState should only be
set while in the jumbling code, not forced to something else
afterwards while in the extension.
--
Michael
Attachments:
regression.diffstext/plain; charset=us-asciiDownload
diff -u /home/user/git/postgres/contrib/pg_stat_statements/expected/dml.out /home/user/git/postgres/contrib/pg_stat_statements/results/dml.out
--- /home/user/git/postgres/contrib/pg_stat_statements/expected/dml.out 2025-05-29 10:49:32.394190162 +0900
+++ /home/user/git/postgres/contrib/pg_stat_statements/results/dml.out 2025-05-30 13:34:54.742674059 +0900
@@ -80,7 +80,7 @@
1 | 10 | INSERT INTO pgss_dml_tab VALUES(generate_series($1, $2), $3)
1 | 12 | SELECT * FROM pgss_dml_tab ORDER BY a
2 | 4 | SELECT * FROM pgss_dml_tab WHERE a > $1 ORDER BY a
- 1 | 8 | SELECT * FROM pgss_dml_tab WHERE a IN ($1 /*, ... */)
+ 1 | 8 | SELECT * FROM pgss_dml_tab WHERE a IN ($1, $2, $3, $4, $5)
1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
1 | 0 | SET pg_stat_statements.track_utility = $1
6 | 6 | UPDATE pgss_dml_tab SET b = $1 WHERE a = $2
diff -u /home/user/git/postgres/contrib/pg_stat_statements/expected/squashing.out /home/user/git/postgres/contrib/pg_stat_statements/results/squashing.out
--- /home/user/git/postgres/contrib/pg_stat_statements/expected/squashing.out 2025-05-30 13:30:39.208783894 +0900
+++ /home/user/git/postgres/contrib/pg_stat_statements/results/squashing.out 2025-05-30 13:34:55.710681261 +0900
@@ -73,12 +73,14 @@
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 3
- SELECT ARRAY[$1 /*, ... */] | 3
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(3 rows)
+ query | calls
+------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3) | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4) | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
+ SELECT ARRAY[$1 /*, ... */] | 3
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
-- built-in functions will be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
@@ -99,9 +101,10 @@
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT WHERE $1 = ANY (ARRAY[$2 /*, ... */]) | 1
+ SELECT WHERE $1 IN ($2, int4($3), int4($4), $5) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+(3 rows)
-- external parameters will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -200,11 +203,14 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 6
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+-----------------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id = ANY (ARRAY[$1 /*, ... */]) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) AND data = $10 | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) AND data = $11 | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -250,12 +256,18 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 6
- AND data IN ($2 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id = ANY (ARRAY[$1 /*, ... */]) +| 3
+ AND data = ANY (ARRAY[$2 /*, ... */]) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) +| 1
+ AND data IN ($10, $11, $12, $13, $14, $15, $16, $17, $18) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) +| 1
+ AND data IN ($11, $12, $13, $14, $15, $16, $17, $18, $19, $20) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)+| 1
+ AND data IN ($12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -372,9 +384,14 @@
query | calls
--------------------------------------------------------------------+-------
SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 3
- SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 7
+ SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 2
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(3 rows)
+(8 rows)
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
@@ -395,11 +412,11 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
- SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+------------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_numeric WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- Bigint, implicit cast is squashed
@@ -421,11 +438,11 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
- SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- Bigint, explicit cast is squashed
@@ -450,12 +467,14 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+----------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN +| 1
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint, $5::bigint, $6::bigint,+|
+ $7::bigint, $8::bigint, $9::bigint, $10::bigint, $11::bigint) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -569,12 +588,17 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_cast WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+----------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY +| 1
+ [$1 /*, ... */]) |
+ SELECT * FROM test_squash_cast WHERE data IN +| 1
+ ($1::int4::casttesttype, $2::int4::casttesttype, $3::int4::casttesttype,+|
+ $4::int4::casttesttype, $5::int4::casttesttype, $6::int4::casttesttype,+|
+ $7::int4::casttesttype, $8::int4::casttesttype, $9::int4::casttesttype,+|
+ $10::int4::casttesttype, $11::int4::casttesttype) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Some casting expression are simplified to Const
CREATE TABLE test_squash_jsonb (id int, data jsonb);
@@ -601,12 +625,16 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY +| 1
+ [$1 /*, ... */]) |
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ (($1)::jsonb, ($2)::jsonb, ($3)::jsonb, ($4)::jsonb,+|
+ ($5)::jsonb, ($6)::jsonb, ($7)::jsonb, ($8)::jsonb,+|
+ ($9)::jsonb, ($10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -701,13 +729,13 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */) |
- SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
- SELECT ARRAY[$1 /*, ... */] | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1::oid, $2::oid, $3::oid, $4::oid, $5::oid, $6::oid, $7::oid, $8::oid, $9::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(4 rows)
--
@@ -773,7 +801,7 @@
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */) | 1
+ select where $1 IN ($2::int, $3::int) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -797,8 +825,9 @@
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */) | 2
-(2 rows)
+ select where $1 = ANY (array[$2 /*, ... */]) | 1
+ select where $1 IN ($2::int::text, $3::int::text) | 1
+(3 rows)
--
-- cleanup
The test additions done in v7-0002 look sensible here.
--- In the following two queries the operator expressions (+) and (@) have --- different oppno, and will be given different query_id if squashed, even though --- the normalized query will be the sameIn v7-0002, this comment is removed, but it still applies, isn't it?
No, the comment is wrong/misleading even in HEAD. The output looks
like the below so, the normalized query will not be the same and the
queries will also not be squashed.
```
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query
| calls
----------------------------------------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN
+| 2
($1 + $2, $3 + $4, $5 + $6, $7 + $8, $9 + $10, $11 + $12, $13
+ $14, $15 + $16, $17 + $18) |
SELECT * FROM test_squash WHERE id IN
+| 2
(@ $1, @ $2, @ $3, @ $4, @ $5, @ $6, @ $7, @ $8, @ $9)
|
SELECT pg_stat_statements_reset() IS NOT NULL AS t
| 1
(3 rows)
```
so I felt a much simpler and more appropriate comment is
```
-- No constants squashing for OpExpr
```
--- Bigint, explicit cast is not squashed +-- Bigint, explicit cast is squashedSeems incorrect with 0002 taken in isolation. The last cast is still
present in the normalization. It's not after v7-0003.
in 0002, this is still a squashed string even if the "::bigint" still appears at
the end of the string. Squashing is replacing the elements with
"$1 /*, ... */"
```
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 2
($1 /*, ... */::bigint)
```
0003 improves/fixes this by truly squashing between the entire
boundary of the list.
Already mentioned upthread, but applying only v7-0003 on top of
v7-0002 (not v7-0004) leads to various regression failures in dml.sql
and squashing.sql. The failures persist with v7-0004 applied. Please
see these as per the attached, the IN lists do not get squashed, the
array elements are. Just to make sure that I am not missing
something, I've rebuilt from scratch with no success.
I cannot reproduce this. I applied each patch (v7-0002, 0003)
in order and ran "make check" on pg_stat_statements for every apply,
and I could not reproduce. Not sure what you and I are doing different?
I also could not reproduce with the v8 patch series either.
IsSquashableExpressionList() includes this comment, which is outdated,
probably because squashing was originally optional behind a GUC and
the parameter has been removed while the comment has not been
refreshed:
/*
* If squashing is disabled, or the list is too short, we don't try to
* squash it.
*/
Thanks for reminding me about this. I remember seeing it, but missed
fixing it. Corrected now.
RecordExpressionLocation()'s top comment needs a refresh, talking
about constants. The simplifications gained in pgss.c's normalization
are pretty cool.
Yes, missed this also. Done.
+ bool has_squashed_lists; [...] + if (jstate->has_squashed_lists) + jstate->highest_extern_param_id = 0;This new flag in JumbleState needs to be documented, explaining why
it needs to be here.
Done. I felt that combining highest_extern_param_id and
has_squashed_lists in the
same comment made the most sense, as they are closely related.
- /* highest Param id we've seen, in order to start normalization
correctly */
+ /*
+ * Highest Param id we've seen, in order to start normalization correctly.
+ * However, if the jumble contains at least one squashed list, we
+ * disregard the highest_extern_param_id value because parameters can
+ * exist within the squashed list and are no longer considered for
+ * normalization.
+ */
int highest_extern_param_id;
+ bool has_squashed_lists;
I have to admit that it is strange to see
highest_extern_param_id, one value in JumbleState be forced to zero in
the PGSS normalization code if has_squashed_lists is set to true.
This seems like a layer violation to me
Yeah, that's silly of me. This should be done in DoJumble after
_jumbleNode. Fixed.
I also reorganized the tests in extended.out to make them more readable,
namely I wanted to show separate outputs for what is tested
for "-- Unique query IDs with parameter numbers switched." and what is tested
for "-- Two groups of two queries with the same query ID."
I also added a comment for
``
bool extern_param;
```
v8 addresses the above.
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v8-0003-Support-Squashing-of-External-Parameters.patchapplication/octet-stream; name=v8-0003-Support-Squashing-of-External-Parameters.patchDownload
From 15c4e81cfd8d3fa32615694c4462026d0bac1a96 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Fri, 30 May 2025 11:42:56 -0500
Subject: [PATCH v8 3/3] Support Squashing of External Parameters
62d712ec introduced the concept of element squashing for
quwry normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
---
.../pg_stat_statements/expected/extended.out | 36 +++--
.../pg_stat_statements/expected/squashing.out | 24 +--
.../pg_stat_statements/pg_stat_statements.c | 4 +
contrib/pg_stat_statements/sql/extended.sql | 5 +-
contrib/pg_stat_statements/sql/squashing.sql | 4 +-
src/backend/nodes/queryjumblefuncs.c | 147 +++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 16 +-
8 files changed, 158 insertions(+), 84 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out
index 7da308ba84f4..6f2c231bf2ae 100644
--- a/contrib/pg_stat_statements/expected/extended.out
+++ b/contrib/pg_stat_statements/expected/extended.out
@@ -69,13 +69,13 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
(4 rows)
-- Various parameter numbering patterns
+-- Unique query IDs with parameter numbers switched.
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- Unique query IDs with parameter numbers switched.
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
--
(0 rows)
@@ -96,7 +96,24 @@ SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
--
(0 rows)
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
+ SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(6 rows)
+
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
--
(1 row)
@@ -114,15 +131,10 @@ SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------+-------
- SELECT WHERE $1::int IN ($2::int, $3::int) | 1
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($3::int, $1::int) | 1
- SELECT WHERE $3::int IN ($1::int, $2::int) | 1
- SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
- SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(8 rows)
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 5376700fef86..ad282ac2b834 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -103,7 +103,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -123,14 +123,14 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
- SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -155,11 +155,11 @@ EXECUTE p1(1, 2, 3, 4, 5);
DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- DEALLOCATE $1 | 2
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 2
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- More conditions in the query
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b08985c0051d..47f74b1e9ac4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2842,6 +2842,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql
index a366658a53a7..ffb5b1628190 100644
--- a/contrib/pg_stat_statements/sql/extended.sql
+++ b/contrib/pg_stat_statements/sql/extended.sql
@@ -21,17 +21,18 @@ SELECT $1 \bind 'unnamed_val1' \g
SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Various parameter numbering patterns
-SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-- Unique query IDs with parameter numbers switched.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
SELECT WHERE ($2::int, 10) IN ((11, $3::int), ($1::int, 12)) \bind '1' '2' '3' \g
SELECT WHERE $1::int IN ($2::int, $3::int) \bind '1' '2' '3' \g
SELECT WHERE $2::int IN ($3::int, $1::int) \bind '1' '2' '3' \g
SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
SELECT WHERE '4'::int IN ($1::int, '5'::int) \bind '2' \g
SELECT WHERE $2::int IN ($1::int, '1'::int) \bind '1' '2' \g
SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
-
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 85aae152da8e..4efd412be9bd 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -32,7 +32,7 @@ SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
;
@@ -40,7 +40,7 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
PREPARE p1(int, int, int, int, int) AS
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 35552731bf69..d41e3eff2406 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,7 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordExpressionLocation(JumbleState *jstate,
- int location, int len);
+ int location, int len, bool extern_param);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
@@ -70,6 +70,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +186,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -207,6 +209,10 @@ DoJumble(JumbleState *jstate, Node *node)
if (jstate->pending_nulls > 0)
FlushPendingNulls(jstate);
+ /* Squashed list found, reset highest_extern_param_id */
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Process the jumble buffer and produce the hash value */
return DatumGetInt64(hash_any_extended(jstate->jumble,
jstate->jumble_len,
@@ -376,13 +382,13 @@ FlushPendingNulls(JumbleState *jstate)
* Record location of an expression within query string of query tree
* that is currently being walked.
*
- * Recorded locations can either be constants or the starting points of
- * lists of elements to be squashed. In the latter case, a length is
- * provided to determine the end of the squashed list and to mark the
- * location accordingly.
+ * Recorded locations can either be constants, external parameters or
+ * the starting points of lists of elements to be squashed. In the latter
+ * case, a length is provided to determine the end of the squashed list
+ * and to mark the location accordingly.
*/
static void
-RecordExpressionLocation(JumbleState *jstate, int location, int len)
+RecordExpressionLocation(JumbleState *jstate, int location, int len, bool extern_param)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -406,10 +412,29 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
*/
jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
+/*
+ * Subroutine for IsSquashableExpression to check if a node is a
+ * constant or External Parameter.
+ */
+static bool
+IsConstOrExternalParam(Node *node)
+{
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ return true;
+ case T_Param:
+ return ((Param *) node)->paramkind == PARAM_EXTERN;
+ default:
+ return false;
+ }
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
@@ -422,42 +447,48 @@ RecordExpressionLocation(JumbleState *jstate, int location, int len)
static bool
IsSquashableExpression(Node *element)
{
+ ListCell *temp;
+
+ /* Unwrap RelabelType and CoerceViaIO layers */
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
if (IsA(element, CoerceViaIO))
element = (Node *) ((CoerceViaIO *) element)->arg;
- if (IsA(element, FuncExpr))
+ switch (nodeTag(element))
{
- FuncExpr *func = (FuncExpr *) element;
- ListCell *temp;
+ case T_FuncExpr:
+ {
+ FuncExpr *func = (FuncExpr *) element;
- if (func->funcformat != COERCE_IMPLICIT_CAST &&
- func->funcformat != COERCE_EXPLICIT_CAST)
- return false;
+ /*
+ * Only implicit/explicit casts on built-in functions are
+ * squashable
+ */
+ if (func->funcformat != COERCE_IMPLICIT_CAST &&
+ func->funcformat != COERCE_EXPLICIT_CAST)
+ return false;
- if (func->funcid > FirstGenbkiObjectId)
- return false;
+ if (func->funcid > FirstGenbkiObjectId)
+ return false;
- foreach(temp, func->args)
- {
- Node *arg = lfirst(temp);
+ /* All arguments must be constants or external parameters */
+ foreach(temp, func->args)
+ {
+ Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
- return false;
- }
+ if (IsConstOrExternalParam(arg))
+ return true;
+ }
- return true;
+ return false;
+ }
+ default:
+ return IsConstOrExternalParam(element);
}
-
- if (!IsA(element, Const))
- return false;
-
- return true;
}
-
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -473,10 +504,7 @@ IsSquashableExpressionList(List *elements)
{
ListCell *temp;
- /*
- * If squashing is disabled, or the list is too short, we don't try to
- * squash it.
- */
+ /* If the list only has 1 element, don't squash it */
if (list_length(elements) < 2)
return false;
@@ -494,7 +522,7 @@ IsSquashableExpressionList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordExpressionLocation(jstate, expr->location, -1)
+ RecordExpressionLocation(jstate, expr->location, -1, false)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -545,8 +573,9 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
RecordExpressionLocation(jstate,
aexpr->list_start + 1,
- (aexpr->list_end - aexpr->list_start) - 1);
+ (aexpr->list_end - aexpr->list_start) - 1, false);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -598,26 +627,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -720,3 +729,35 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+/*
+ * Custom query jumble function for _jumbleParam.
+ */
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ /*
+ * At this point, Only external parameter locations outside of
+ * squashable lists will be recorded.
+ */
+ RecordExpressionLocation(jstate, expr->location, -1, true);
+
+ /*
+ * Update the highest Param id seen, in order to start normalization
+ * correctly.
+ *
+ * Note: This value is reset at the end of jumbling if there exists a
+ * squashable list. See the comment in the definition of JumbleState.
+ */
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 773cdd880aa8..99d2c019c4bd 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e6..bab971162dc7 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,13 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+
+ /*
+ * Indicates whether a location is that of an external parameter, so it
+ * can be decided during normalization whether the parameter number should
+ * be replaced or kept as is.
+ */
+ bool extern_param;
} LocationLen;
/*
@@ -52,8 +59,15 @@ typedef struct JumbleState
/* Current number of valid entries in clocations array */
int clocations_count;
- /* highest Param id we've seen, in order to start normalization correctly */
+ /*
+ * Highest Param id we've seen, in order to start normalization correctly.
+ * However, if the jumble contains at least one squashed list, we
+ * disregard the highest_extern_param_id value because parameters can
+ * exist within the squashed list and are no longer considered for
+ * normalization.
+ */
int highest_extern_param_id;
+ bool has_squashed_lists;
/*
* Count of the number of NULL nodes seen since last appending a value.
--
2.39.5 (Apple Git-154)
v8-0002-Fix-Normalization-for-squashed-query-texts.patchapplication/octet-stream; name=v8-0002-Fix-Normalization-for-squashed-query-texts.patchDownload
From a817200835ef6b18211e97d619da777159c2a3a2 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:11:46 -0500
Subject: [PATCH v8 2/3] Fix Normalization for squashed query texts
62d712ec added the ability to squash constants from an
IN list/ArrayExpr for queryId computation purposes. However,
in certain cases, this broke normalization. For example,
"IN (1, 2, int4(1))" is normalized to "IN ($2 /*, ... */))",
which leaves an extra parenthesis at the end of the normalized string.
To correct this, the start and end boundaries of an expr_list are
now tracked by the various nodes used during parsing and are made
available to the ArrayExpr node for query jumbling. Having these
boundaries allows normalization to precisely identify the locations
in the query text that should be squashed.
---
.../pg_stat_statements/expected/select.out | 30 ++++++
.../pg_stat_statements/expected/squashing.out | 14 +--
.../pg_stat_statements/pg_stat_statements.c | 76 ++++-----------
contrib/pg_stat_statements/sql/select.sql | 8 ++
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 95 ++++++++++---------
src/backend/parser/gram.y | 68 ++++++++++---
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 4 +
src/include/nodes/primnodes.h | 4 +
10 files changed, 179 insertions(+), 126 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 038ae1103645..a57e11ef5b1c 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -267,6 +267,36 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
(4 rows)
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
--
-- queries with locking clauses
--
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index b8724e3356c3..5376700fef86 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -453,7 +453,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 2
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -572,7 +572,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 2
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -604,7 +604,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 2
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -704,9 +704,9 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
--------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */::oid) |
+ ($1 /*, ... */) |
SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(4 rows)
@@ -773,7 +773,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2 /*, ... */) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -797,7 +797,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int::text) | 2
+ select where $1 IN ($2 /*, ... */) | 2
(2 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 129001c70c81..b08985c0051d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2817,7 +2817,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
int num_constants_replaced = 0;
/*
@@ -2832,9 +2831,6 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
@@ -2856,63 +2852,22 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * And insert a param symbol in place of the constant token.
+ *
+ * However, If we have a squashable list, insert a comment that starts
+ * from the second value of the list.
*/
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
-
- Assert(len_to_wrt >= 0);
-
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced++ + 1 + jstate->highest_extern_param_id,
+ (jstate->clocations[i].squashed) ? " /*, ... */" : "");
/* Otherwise the constant is squashed away -- move forward */
quer_loc = off + tok_len;
@@ -3005,6 +2960,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index 189d405512fc..11662cde08c9 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -87,6 +87,14 @@ SELECT WHERE (1, 2) IN ((1, 2), (2, 3));
SELECT WHERE (3, 4) IN ((5, 6), (8, 7));
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- queries with locking clauses
--
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index c8595109b0e1..9ecddb142314 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1329,7 +1329,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index ac3cb3d9cafe..35552731bf69 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -60,10 +60,10 @@ static int64 DoJumble(JumbleState *jstate, Node *node);
static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
-static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+static void RecordExpressionLocation(JumbleState *jstate,
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -373,15 +373,16 @@ FlushPendingNulls(JumbleState *jstate)
/*
- * Record location of constant within query string of query tree that is
- * currently being walked.
+ * Record location of an expression within query string of query tree
+ * that is currently being walked.
*
- * 'squashed' signals that the constant represents the first or the last
- * element in a series of merged constants, and everything but the first/last
- * element contributes nothing to the jumble hash.
+ * Recorded locations can either be constants or the starting points of
+ * lists of elements to be squashed. In the latter case, a length is
+ * provided to determine the end of the squashed list and to mark the
+ * location accordingly.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordExpressionLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +397,15 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * initialize lengths to -1 to simplify third-party module usage
+ *
+ * If we have a length that is greater than -1, this indicates a
+ * squashable list.
+ */
+ jstate->clocations[jstate->clocations_count].length = (len > -1) ? len : -1;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1) ? true : false;
jstate->clocations_count++;
}
}
@@ -413,7 +420,7 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableExpression(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -450,6 +457,7 @@ IsSquashableConst(Node *element)
return true;
}
+
/*
* Subroutine for _jumbleElements: Verify whether the provided list
* can be squashed, meaning it contains only constant expressions.
@@ -461,7 +469,7 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableExpressionList(List *elements)
{
ListCell *temp;
@@ -474,22 +482,19 @@ IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableExpression(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordExpressionLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +522,36 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow constants to be squashed. To normalize such queries, we use
+ * the start and end locations of the list of elements in a list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableExpressionList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordExpressionLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d11..e6f0581fdc96 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,17 @@ typedef struct KeyActions
KeyAction *deleteAction;
} KeyActions;
+/*
+ * Track the start and end of a list in an expression, such as an 'IN' list
+ * or Array Expression
+ */
+typedef struct ListWithBoundary
+{
+ Node *expr;
+ ParseLoc start;
+ ParseLoc end;
+} ListWithBoundary;
+
/* ConstraintAttributeSpec yields an integer bitmask of these flags: */
#define CAS_NOT_DEFERRABLE 0x01
#define CAS_DEFERRABLE 0x02
@@ -184,7 +195,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -269,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+ struct ListWithBoundary *listwithboundary;
}
%type <node> stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +535,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
+%type <listwithboundary> in_expr
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
%type <boolean> opt_ordinality opt_without_overlaps
@@ -15289,11 +15302,13 @@ a_expr: c_expr { $$ = $1; }
}
| a_expr IN_P in_expr
{
+ ListWithBoundary *l = $3;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
+ if (IsA(l->expr, SubLink))
{
/* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
+ SubLink *n = (SubLink *) l->expr;
n->subLinkType = ANY_SUBLINK;
n->subLinkId = 0;
@@ -15305,17 +15320,23 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "=", $1, l->expr, @2);
+
+ n->rexpr_list_start = $3->start;
+ n->rexpr_list_end = $3->end;
+ $$ = (Node *) n;
}
}
| a_expr NOT_LA IN_P in_expr %prec NOT_LA
{
+ ListWithBoundary *l = $4;
+
/* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
+ if (IsA(l->expr, SubLink))
{
/* generate NOT (foo = ANY (subquery)) */
/* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ SubLink *n = (SubLink *) l->expr;
n->subLinkType = ANY_SUBLINK;
n->subLinkId = 0;
@@ -15328,7 +15349,11 @@ a_expr: c_expr { $$ = $1; }
else
{
/* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "<>", $1, l->expr, @2);
+
+ n->rexpr_list_start = $4->start;
+ n->rexpr_list_end = $4->end;
+ $$ = (Node *) n;
}
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
@@ -16764,15 +16789,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16897,12 +16922,25 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
in_expr: select_with_parens
{
SubLink *n = makeNode(SubLink);
+ ListWithBoundary *l = palloc(sizeof(ListWithBoundary));
n->subselect = $1;
/* other fields will be filled later */
- $$ = (Node *) n;
+
+ l->expr = (Node *) n;
+ l->start = -1;
+ l->end = -1;
+ $$ = l;
+ }
+ | '(' expr_list ')'
+ {
+ ListWithBoundary *l = palloc(sizeof(ListWithBoundary));
+
+ l->expr = (Node *) $2;
+ l->start = @1;
+ l->end = @3;
+ $$ = l;
}
- | '(' expr_list ')' { $$ = (Node *) $2; }
;
/*
@@ -19300,12 +19338,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673d..7347c989e110 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1224,6 +1224,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->elements = aexprs;
newa->multidims = false;
newa->location = -1;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
result = (Node *) make_scalar_array_op(pstate,
a->name,
@@ -2166,6 +2168,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
newa->location = a->location;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
return (Node *) newa;
}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dd00ab420b8a..74ad887d87c3 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -352,6 +352,8 @@ typedef struct A_Expr
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc rexpr_list_start; /* location of the start of a rexpr list */
+ ParseLoc rexpr_list_end; /* location of the end of a rexpr list */
} A_Expr;
/*
@@ -507,6 +509,8 @@ typedef struct A_ArrayExpr
NodeTag type;
List *elements; /* array element expressions */
ParseLoc location; /* token location, or -1 if unknown */
+ ParseLoc list_start; /* location of the start of the elements list */
+ ParseLoc list_end; /* location of the end of the elements list */
} A_ArrayExpr;
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f266..773cdd880aa8 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1399,6 +1399,10 @@ typedef struct ArrayExpr
bool multidims pg_node_attr(query_jumble_ignore);
/* token location, or -1 if unknown */
ParseLoc location;
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
} ArrayExpr;
/*
--
2.39.5 (Apple Git-154)
v8-0001-Enhanced-query-jumbling-squashing-tests.patchapplication/octet-stream; name=v8-0001-Enhanced-query-jumbling-squashing-tests.patchDownload
From 99d4ae51041b44351eedca0690178e3597505659 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v8 1/3] Enhanced query jumbling squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones. Also, enhance tests for the
negative cases of RelabelType, CoerceViaIO and FuncExpr. While at it,
re-organized some parts of the tests and correct minor spacing issues.
---
.../pg_stat_statements/expected/squashing.out | 528 +++++++++++++++---
contrib/pg_stat_statements/sql/squashing.sql | 186 +++++-
2 files changed, 613 insertions(+), 101 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c9..b8724e3356c3 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -2,9 +2,11 @@
-- Const squashing functionality
--
CREATE EXTENSION pg_stat_statements;
+--
+--Simple Lists
+--
CREATE TABLE test_squash (id int, data int);
--- IN queries
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -16,42 +18,149 @@ SELECT * FROM test_squash WHERE id IN (1);
----+------
(0 rows)
+SELECT ARRAY[1];
+ array
+-------
+ {1}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT ARRAY[$1] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
id | data
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1, 2, 3];
+ array
+---------
+ {1,2,3}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4];
+ array
+-----------
+ {1,2,3,4}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4, 5];
+ array
+-------------
+ {1,2,3,4,5}
+(1 row)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
- SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 3
+ SELECT ARRAY[$1 /*, ... */] | 3
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- external parameters will not be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
id | data
----+------
(0 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+---------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
id | data
----+------
(0 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
id | data
----+------
(0 rows)
+DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 4
- SELECT * FROM test_squash WHERE id IN ($1) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
-(4 rows)
+ query | calls
+------------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- More conditions in the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -75,10 +184,25 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND da
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
---------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 6
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -107,24 +231,46 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 6
AND data IN ($2 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
id | data
@@ -137,19 +283,35 @@ SELECT * FROM test_squash WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
($1 + $2, $3 + $4, $5 + $6, $7 + $8, $9 + $10, $11 + $12, $13 + $14, $15 + $16, $17 + $18) |
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
(@ $1, @ $2, @ $3, @ $4, @ $5, @ $6, @ $7, @ $8, @ $9) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -181,12 +343,38 @@ SELECT data FROM test_float WHERE data IN (1.0, 1.0);
------
(0 rows)
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
+ data
+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------+-------
- SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 5
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 3
+ SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 7
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
@@ -201,12 +389,18 @@ SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
----+------
(0 rows)
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------------+-------
- SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
@@ -221,14 +415,20 @@ SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -242,15 +442,22 @@ SELECT * FROM test_squash_bigint WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN +| 1
+ SELECT * FROM test_squash_bigint WHERE data IN +| 2
($1 /*, ... */::bigint) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -264,44 +471,47 @@ SELECT * FROM test_squash_bigint WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE id IN +| 1
+ SELECT * FROM test_squash_bigint WHERE id IN +| 2
(abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
abs($8), abs($9), abs($10), ((abs($11)))) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
- id | data
-----+------
-(0 rows)
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
+--
+(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
- (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
- (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
- (SELECT $10)::jsonb) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
CREATE FUNCTION casttesttype_in(cstring)
@@ -349,15 +559,25 @@ SELECT * FROM test_squash_cast WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_cast WHERE data IN +| 1
+ SELECT * FROM test_squash_cast WHERE data IN +| 2
($1 /*, ... */::int4::casttesttype) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -366,8 +586,16 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
id | data
----+------
(0 rows)
@@ -375,28 +603,144 @@ SELECT * FROM test_squash_jsonb WHERE data IN
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
(($1 /*, ... */)::jsonb) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
+ ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::text::int::text::int, $3::text::int::text::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+--
-- RelabelType
+--
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1 /*, ... */::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(4 rows)
+
+--
+-- edge cases
+--
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,23 +753,59 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
- array
-------------------------
- {1,2,3,4,5,6,7,8,9,10}
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+--
+(1 row)
+
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+--
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2::int, $3::int::text) | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+--
+(1 row)
+
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int::text) | 2
(2 rows)
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8e..85aae152da8e 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -3,101 +3,160 @@
--
CREATE EXTENSION pg_stat_statements;
-CREATE TABLE test_squash (id int, data int);
+--
+--Simple Lists
+--
--- IN queries
+CREATE TABLE test_squash (id int, data int);
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1);
+SELECT ARRAY[1];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+SELECT ARRAY[1, 2, 3];
+SELECT ARRAY[1, 2, 3, 4];
+SELECT ARRAY[1, 2, 3, 4, 5];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- More conditions in the query
+-- external parameters will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- More conditions in the query
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT data FROM test_float WHERE data IN (1, 2);
SELECT data FROM test_float WHERE data IN (1, '2');
SELECT data FROM test_float WHERE data IN ('1', 2);
SELECT data FROM test_float WHERE data IN ('1', '2');
SELECT data FROM test_float WHERE data IN (1.0, 1.0);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN
(1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE id IN
(abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
abs(800), abs(900), abs(1000), ((abs(1100))));
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
@@ -141,19 +200,73 @@ SELECT * FROM test_squash_cast WHERE data IN
4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
10::int4::casttesttype, 11::int4::casttesttype);
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
-- RelabelType
+--
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- edge cases
+--
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -163,7 +276,26 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
\ No newline at end of file
--
2.39.5 (Apple Git-154)
I realized this thread did not have a CF entry,
so here it is https://commitfest.postgresql.org/patch/5801/
--
Sami
Hello,
I've spent a bunch of time looking at this series and here's my take on
the second one. (The testing patch is unchanged from Sami's). The
third patch (for PARAM_EXTERNs) should be a mostly trivial rebase on top
of these two.
I realized that the whole in_expr production in gram.y is pointless, and
the whole private struct that was added was unnecessary. A much simpler
solution is to remove in_expr, expand its use in a_expr to the two
possibilities, and with that we can remove the need for a new struct.
I also added a recursive call in IsSquashableExpression to itself. The
check for stack depth can be done without throwing an error. I tested
this by adding stack bloat in that function. I also renamed it to
IsSquashableConstant. This changes one of the tests, because a cast
sequence like 42::int::bigint::int is considered squashable.
Other than that, the changes are cosmetic.
Barring objections, I'll push this soon, then look at rebasing 0003 on
top, which I expect to be an easy job.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachments:
v9-0001-Enhanced-query-jumbling-squashing-tests.patchtext/x-diff; charset=utf-8Download
From 318605685954a86ff61624ec9404fb2a0e780934 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v9 1/2] Enhanced query jumbling squashing tests
Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones. Also, enhance tests for the
negative cases of RelabelType, CoerceViaIO and FuncExpr. While at it,
re-organized some parts of the tests and correct minor spacing issues.
---
.../pg_stat_statements/expected/squashing.out | 540 +++++++++++++++---
contrib/pg_stat_statements/sql/squashing.sql | 190 +++++-
2 files changed, 621 insertions(+), 109 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..b8724e3356c 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -2,9 +2,11 @@
-- Const squashing functionality
--
CREATE EXTENSION pg_stat_statements;
+--
+--Simple Lists
+--
CREATE TABLE test_squash (id int, data int);
--- IN queries
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -16,42 +18,149 @@ SELECT * FROM test_squash WHERE id IN (1);
----+------
(0 rows)
+SELECT ARRAY[1];
+ array
+-------
+ {1}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT ARRAY[$1] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN (1, 2, 3);
id | data
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1, 2, 3];
+ array
+---------
+ {1,2,3}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4];
+ array
+-----------
+ {1,2,3,4}
+(1 row)
+
+SELECT ARRAY[1, 2, 3, 4, 5];
+ array
+-------------
+ {1,2,3,4,5}
+(1 row)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
- SELECT * FROM test_squash WHERE id IN ($1) | 1
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 3
+ SELECT ARRAY[$1 /*, ... */] | 3
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- external parameters will not be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
id | data
----+------
(0 rows)
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
- id | data
-----+------
-(0 rows)
-
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 4
- SELECT * FROM test_squash WHERE id IN ($1) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
-(4 rows)
+ query | calls
+---------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
+ id | data
+----+------
+(0 rows)
+
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
+ id | data
+----+------
+(0 rows)
+
+DEALLOCATE p1;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- More conditions in the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -75,10 +184,25 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND da
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
---------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 6
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -107,24 +231,46 @@ SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 3
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 6
AND data IN ($2 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
id | data
@@ -137,19 +283,35 @@ SELECT * FROM test_squash WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
($1 + $2, $3 + $4, $5 + $6, $7 + $8, $9 + $10, $11 + $12, $13 + $14, $15 + $16, $17 + $18) |
- SELECT * FROM test_squash WHERE id IN +| 1
+ SELECT * FROM test_squash WHERE id IN +| 2
(@ $1, @ $2, @ $3, @ $4, @ $5, @ $6, @ $7, @ $8, @ $9) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -181,12 +343,38 @@ SELECT data FROM test_float WHERE data IN (1.0, 1.0);
------
(0 rows)
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+ data
+------
+(0 rows)
+
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
+ data
+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------+-------
- SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 5
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 3
+ SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 7
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
@@ -201,12 +389,18 @@ SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
----+------
(0 rows)
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------------+-------
- SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
@@ -221,14 +415,20 @@ SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
----+------
(0 rows)
-SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
+ id | data
+----+------
+(0 rows)
--- Bigint, explicit cast is not squashed
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -242,15 +442,22 @@ SELECT * FROM test_squash_bigint WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN +| 1
+ SELECT * FROM test_squash_bigint WHERE data IN +| 2
($1 /*, ... */::bigint) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -264,44 +471,47 @@ SELECT * FROM test_squash_bigint WHERE id IN
----+------
(0 rows)
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
-------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE id IN +| 1
+ SELECT * FROM test_squash_bigint WHERE id IN +| 2
(abs($1), abs($2), abs($3), abs($4), abs($5), abs($6), abs($7),+|
abs($8), abs($9), abs($10), ((abs($11)))) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
- id | data
-----+------
-(0 rows)
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
+--
+(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
- ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
- (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
- (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
- (SELECT $10)::jsonb) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
CREATE FUNCTION casttesttype_in(cstring)
@@ -349,15 +559,25 @@ SELECT * FROM test_squash_cast WHERE data IN
----+------
(0 rows)
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
+ id | data
+----+------
+(0 rows)
+
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_cast WHERE data IN +| 1
+ SELECT * FROM test_squash_cast WHERE data IN +| 2
($1 /*, ... */::int4::casttesttype) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -366,8 +586,16 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
id | data
----+------
(0 rows)
@@ -375,28 +603,144 @@ SELECT * FROM test_squash_jsonb WHERE data IN
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
(($1 /*, ... */)::jsonb) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- RelabelType
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
id | data
----+------
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */::oid) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 2
+ ((SELECT $1)::jsonb, (SELECT $2)::jsonb, (SELECT $3)::jsonb,+|
+ (SELECT $4)::jsonb, (SELECT $5)::jsonb, (SELECT $6)::jsonb,+|
+ (SELECT $7)::jsonb, (SELECT $8)::jsonb, (SELECT $9)::jsonb,+|
+ (SELECT $10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+--
+(1 row)
+
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2::text::int::text::int, $3::text::int::text::int) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+--
+-- RelabelType
+--
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+ array
+---------------------
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+ id | data
+----+------
+(0 rows)
+
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
+ id | data
+----+------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1 /*, ... */::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
+ SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(4 rows)
+
+--
+-- edge cases
+--
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
+ array
+-----------------------------------------------------------------------------------------------
+ {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +|
+ ARRAY[$2 /*, ... */], +|
+ ARRAY[$3 /*, ... */], +|
+ ARRAY[$4 /*, ... */] +|
+ ] |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -409,23 +753,59 @@ FROM cte;
--------
(0 rows)
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
- array
-------------------------
- {1,2,3,4,5,6,7,8,9,10}
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+--
+(1 row)
+
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
+--
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2::int, $3::int::text) | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+--
+(1 row)
+
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+----------------------------------------------------+-------
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ select where $1 IN ($2 /*, ... */::int::text) | 2
(2 rows)
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 03efd4b40c8..85aae152da8 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -3,101 +3,160 @@
--
CREATE EXTENSION pg_stat_statements;
+--
+--Simple Lists
+--
+
CREATE TABLE test_squash (id int, data int);
--- IN queries
-
--- Normal scenario, too many simple constants for an IN query
+-- single element will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN (1);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT ARRAY[1];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
-SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+-- more than 1 element in a list will be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash WHERE id IN (1, 2, 3);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4);
+SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5);
+SELECT ARRAY[1, 2, 3];
+SELECT ARRAY[1, 2, 3, 4];
+SELECT ARRAY[1, 2, 3, 4, 5];
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- built-in functions will be squashed
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- external parameters will not be squashed
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
+;
+SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind 1 2 3 4 5
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- neither are prepared statements
+-- the IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
+PREPARE p1(int, int, int, int, int) AS
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[$1, $2, $3, $4, $5]);
+EXECUTE p1(1, 2, 3, 4, 5);
+DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- More conditions in the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND data = 2;
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AND data = 2;
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) AND data = 2;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM test_squash WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
AND data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+SELECT * FROM test_squash WHERE id = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
+ AND data = ANY (ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-
--- No constants simplification for OpExpr
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
--- In the following two queries the operator expressions (+) and (@) have
--- different oppno, and will be given different query_id if squashed, even though
--- the normalized query will be the same
+-- No constants squashing for OpExpr
+-- The IN and ARRAY forms of this statement will have the same queryId
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN
(1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9);
SELECT * FROM test_squash WHERE id IN
(@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9');
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [1 + 1, 2 + 2, 3 + 3, 4 + 4, 5 + 5, 6 + 6, 7 + 7, 8 + 8, 9 + 9]);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY
+ [@ '-1', @ '-2', @ '-3', @ '-4', @ '-5', @ '-6', @ '-7', @ '-8', @ '-9']);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- FuncExpr
+--
-- Verify multiple type representation end up with the same query_id
CREATE TABLE test_float (data float);
+-- The casted ARRAY expressions will have the same queryId as the IN clause
+-- form of the query
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT data FROM test_float WHERE data IN (1, 2);
SELECT data FROM test_float WHERE data IN (1, '2');
SELECT data FROM test_float WHERE data IN ('1', 2);
SELECT data FROM test_float WHERE data IN ('1', '2');
SELECT data FROM test_float WHERE data IN (1.0, 1.0);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1'::double precision, '2'::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1.0::double precision, 1.0::double precision]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, 2]);
+SELECT data FROM test_float WHERE data = ANY(ARRAY[1, '2']);
+SELECT data FROM test_float WHERE data = ANY(ARRAY['1', 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_numeric WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Bigint, implicit cast is squashed
CREATE TABLE test_squash_bigint (id int, data bigint);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, explicit cast is not squashed
+-- Bigint, explicit cast is squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE data IN
(1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint);
+SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[
+ 1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint, 6::bigint,
+ 7::bigint, 8::bigint, 9::bigint, 10::bigint, 11::bigint]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- Bigint, long tokens with parenthesis
+-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_bigint WHERE id IN
(abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
abs(800), abs(900), abs(1000), ((abs(1100))));
+SELECT * FROM test_squash_bigint WHERE id = ANY(ARRAY[
+ abs(100), abs(200), abs(300), abs(400), abs(500), abs(600), abs(700),
+ abs(800), abs(900), abs(1000), ((abs(1100)))]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- CoerceViaIO, SubLink instead of a Const
-CREATE TABLE test_squash_jsonb (id int, data jsonb);
+-- Multiple FuncExpr's. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash_jsonb WHERE data IN
- ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
- (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
- (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
- (SELECT '"10"')::jsonb);
+SELECT WHERE 1 IN (1::int::bigint::int, 2::int::bigint::int);
+SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
-- CoerceViaIO
+--
-- Create some dummy type to force CoerceViaIO
CREATE TYPE casttesttype;
@@ -141,19 +200,73 @@ SELECT * FROM test_squash_cast WHERE data IN
4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
10::int4::casttesttype, 11::int4::casttesttype);
+SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY
+ [1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+ 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+ 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+ 10::int4::casttesttype, 11::int4::casttesttype]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Some casting expression are simplified to Const
+CREATE TABLE test_squash_jsonb (id int, data jsonb);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash_jsonb WHERE data IN
(('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
- ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
- ( '"9"')::jsonb, ( '"10"')::jsonb);
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY
+ [('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+ ('"5"')::jsonb, ('"6"')::jsonb, ('"7"')::jsonb, ('"8"')::jsonb,
+ ('"9"')::jsonb, ('"10"')::jsonb]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- RelabelType
+-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT * FROM test_squash_jsonb WHERE data IN
+ ((SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb);
+SELECT * FROM test_squash_jsonb WHERE data = ANY(ARRAY
+ [(SELECT '"1"')::jsonb, (SELECT '"2"')::jsonb, (SELECT '"3"')::jsonb,
+ (SELECT '"4"')::jsonb, (SELECT '"5"')::jsonb, (SELECT '"6"')::jsonb,
+ (SELECT '"7"')::jsonb, (SELECT '"8"')::jsonb, (SELECT '"9"')::jsonb,
+ (SELECT '"10"')::jsonb]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- Multiple CoerceViaIO wrapping a constant. Will not squash
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1::text::int::text::int, 1::text::int::text::int);
+SELECT WHERE 1 = ANY(ARRAY[1::text::int::text::int, 1::text::int::text::int]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- RelabelType
+--
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- if there is only one level of RelabelType, the list will be squashable
+SELECT * FROM test_squash WHERE id IN
+ (1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid);
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid];
+-- if there is at least one element with multiple levels of RelabelType,
+-- the list will not be squashable
+SELECT * FROM test_squash WHERE id IN (1::oid, 2::oid::int::oid);
+SELECT * FROM test_squash WHERE id = ANY(ARRAY[1::oid, 2::oid::int::oid]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- edge cases
+--
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- for nested arrays, only constants are squashed
+SELECT ARRAY[
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ ];
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Test constants evaluation in a CTE, which was causing issues in the past
@@ -163,7 +276,26 @@ WITH cte AS (
SELECT ARRAY['a', 'b', 'c', const::varchar] AS result
FROM cte;
--- Simple array would be squashed as well
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-SELECT ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+-- Rewritten as an OpExpr, so it will not be squashed
+select where '1' IN ('1'::int, '2'::int::text);
+-- Rewritten as an ArrayExpr, so it will be squashed
+select where '1' IN ('1'::int, '2'::int);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- Both of these queries will be rewritten as an ArrayExpr, so they
+-- will be squashed, and have a similar queryId
+select where '1' IN ('1'::int::text, '2'::int::text);
+select where '1' = ANY (array['1'::int::text, '2'::int::text]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- cleanup
+--
+DROP TABLE test_squash;
+DROP TABLE test_float;
+DROP TABLE test_squash_numeric;
+DROP TABLE test_squash_bigint;
+DROP TABLE test_squash_cast CASCADE;
+DROP TABLE test_squash_jsonb;
\ No newline at end of file
--
2.39.5
v9-0002-Fix-Normalization-for-squashed-query-texts.patchtext/x-diff; charset=utf-8Download
From 76a9d3dde944a41d61bf7e448d1eaf7db5d55f39 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Mon, 26 May 2025 22:11:46 -0500
Subject: [PATCH v9 2/2] Fix Normalization for squashed query texts
62d712ec added the ability to squash constants from an
IN list/ArrayExpr for queryId computation purposes. However,
in certain cases, this broke normalization. For example,
"IN (1, 2, int4(1))" is normalized to "IN ($2 /*, ... */))",
which leaves an extra parenthesis at the end of the normalized string.
To correct this, the start and end boundaries of an expr_list are
now tracked by the various nodes used during parsing and are made
available to the ArrayExpr node for query jumbling. Having these
boundaries allows normalization to precisely identify the locations
in the query text that should be squashed.
---
.../pg_stat_statements/expected/select.out | 30 ++++
.../pg_stat_statements/expected/squashing.out | 22 +--
.../pg_stat_statements/pg_stat_statements.c | 83 +++-------
contrib/pg_stat_statements/sql/select.sql | 8 +
contrib/pg_stat_statements/sql/squashing.sql | 2 +-
src/backend/nodes/gen_node_support.pl | 2 +-
src/backend/nodes/queryjumblefuncs.c | 153 ++++++++++--------
src/backend/parser/gram.y | 102 ++++++------
src/backend/parser/parse_expr.c | 4 +
src/include/nodes/parsenodes.h | 10 ++
src/include/nodes/primnodes.h | 4 +
11 files changed, 222 insertions(+), 198 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 038ae110364..a57e11ef5b1 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -267,6 +267,36 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
(4 rows)
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+--
+(1 row)
+
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
--
-- queries with locking clauses
--
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index b8724e3356c..9cbd308ec43 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -453,7 +453,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_bigint WHERE data IN +| 2
- ($1 /*, ... */::bigint) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -503,10 +503,10 @@ SELECT WHERE 1 = ANY(ARRAY[1::int::bigint::int, 2::int::bigint::int]);
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
------------------------------------------------------------------+-------
- SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 2
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--
@@ -572,7 +572,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_cast WHERE data IN +| 2
- ($1 /*, ... */::int4::casttesttype) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -604,7 +604,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT * FROM test_squash_jsonb WHERE data IN +| 2
- (($1 /*, ... */)::jsonb) |
+ ($1 /*, ... */) |
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
@@ -704,9 +704,9 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
--------------------------------------------------------------------+-------
SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */::oid) |
+ ($1 /*, ... */) |
SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
- SELECT ARRAY[$1 /*, ... */::oid] | 1
+ SELECT ARRAY[$1 /*, ... */] | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(4 rows)
@@ -773,7 +773,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int) | 1
+ select where $1 IN ($2 /*, ... */) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -797,7 +797,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */::int::text) | 2
+ select where $1 IN ($2 /*, ... */) | 2
(2 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 129001c70c8..ecc7f2fb266 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2810,14 +2810,12 @@ generate_normalized_query(JumbleState *jstate, const char *query,
{
char *norm_query;
int query_len = *query_len_p;
- int i,
- norm_query_buflen, /* Space allowed for norm_query */
+ int norm_query_buflen, /* Space allowed for norm_query */
len_to_wrt, /* Length (in bytes) to write */
quer_loc = 0, /* Source query byte location */
n_quer_loc = 0, /* Normalized query byte location */
last_off = 0, /* Offset from start for previous tok */
last_tok_len = 0; /* Length (in bytes) of that tok */
- bool in_squashed = false; /* in a run of squashed consts? */
int num_constants_replaced = 0;
/*
@@ -2832,16 +2830,13 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
* could refine that limit based on the max value of n for the current
* query, but it hardly seems worth any extra effort to do so.
- *
- * Note this also gives enough room for the commented-out ", ..." list
- * syntax used by constant squashing.
*/
norm_query_buflen = query_len + jstate->clocations_count * 10;
/* Allocate result buffer */
norm_query = palloc(norm_query_buflen + 1);
- for (i = 0; i < jstate->clocations_count; i++)
+ for (int i = 0; i < jstate->clocations_count; i++)
{
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
@@ -2856,65 +2851,24 @@ generate_normalized_query(JumbleState *jstate, const char *query,
if (tok_len < 0)
continue; /* ignore any duplicates */
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
/*
- * What to do next depends on whether we're squashing constant lists,
- * and whether we're already in a run of such constants.
+ * And insert a param symbol in place of the constant token; and, if
+ * we have a squashable list, insert a placeholder comment starting
+ * from the list's second value.
*/
- if (!jstate->clocations[i].squashed)
- {
- /*
- * This location corresponds to a constant not to be squashed.
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced + 1 + jstate->highest_extern_param_id,
+ jstate->clocations[i].squashed ? " /*, ... */" : "");
+ num_constants_replaced++;
- Assert(len_to_wrt >= 0);
-
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then a param symbol replacing the constant itself */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* In case previous constants were merged away, stop doing that */
- in_squashed = false;
- }
- else if (!in_squashed)
- {
- /*
- * This location is the start position of a run of constants to be
- * squashed, so we need to print the representation of starting a
- * group of stashed constants.
- *
- * Print what comes before the constant ...
- */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- Assert(i + 1 < jstate->clocations_count);
- Assert(jstate->clocations[i + 1].squashed);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /* ... and then start a run of squashed constants */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d /*, ... */",
- num_constants_replaced++ + 1 + jstate->highest_extern_param_id);
-
- /* The next location will match the block below, to end the run */
- in_squashed = true;
- }
- else
- {
- /*
- * The second location of a run of squashable elements; this
- * indicates its end.
- */
- in_squashed = false;
- }
-
- /* Otherwise the constant is squashed away -- move forward */
+ /* move forward */
quer_loc = off + tok_len;
last_off = off;
last_tok_len = tok_len;
@@ -3005,6 +2959,9 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
+ if (locs[i].squashed)
+ continue; /* squashable list, ignore */
+
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index 189d405512f..11662cde08c 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -87,6 +87,14 @@ SELECT WHERE (1, 2) IN ((1, 2), (2, 3));
SELECT WHERE (3, 4) IN ((5, 6), (8, 7));
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- with the last element being an explicit function call with an argument, ensure
+-- the normalization of the squashing interval is correct.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT WHERE 1 IN (1, int4(1), int4(2));
+SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2)]);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- queries with locking clauses
--
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 85aae152da8..ba0a72b6529 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -298,4 +298,4 @@ DROP TABLE test_float;
DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
-DROP TABLE test_squash_jsonb;
\ No newline at end of file
+DROP TABLE test_squash_jsonb;
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index c8595109b0e..9ecddb14231 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -1329,7 +1329,7 @@ _jumble${n}(JumbleState *jstate, Node *node)
# Node type. Squash constants if requested.
if ($query_jumble_squash)
{
- print $jff "\tJUMBLE_ELEMENTS($f);\n"
+ print $jff "\tJUMBLE_ELEMENTS($f, node);\n"
unless $query_jumble_ignore;
}
else
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index ac3cb3d9caf..fb33e6931ad 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,9 +61,9 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
- int location, bool squashed);
+ int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
-static void _jumbleElements(JumbleState *jstate, List *elements);
+static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
static void _jumbleA_Const(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
@@ -373,15 +373,17 @@ FlushPendingNulls(JumbleState *jstate)
/*
- * Record location of constant within query string of query tree that is
- * currently being walked.
+ * Record the location of some kind of constant within a query string.
+ * These are not only bare constants but also expressions that ultimately
+ * constitute a constant, such as those inside casts and simple function
+ * calls.
*
- * 'squashed' signals that the constant represents the first or the last
- * element in a series of merged constants, and everything but the first/last
- * element contributes nothing to the jumble hash.
+ * If length is -1, it indicates a single such constant element. If
+ * it's a positive integer, it indicates the length of a squashable
+ * list of them.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, bool squashed)
+RecordConstLocation(JumbleState *jstate, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -396,9 +398,14 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
sizeof(LocationLen));
}
jstate->clocations[jstate->clocations_count].location = location;
- /* initialize lengths to -1 to simplify third-party module usage */
- jstate->clocations[jstate->clocations_count].squashed = squashed;
- jstate->clocations[jstate->clocations_count].length = -1;
+
+ /*
+ * Lengths are either positive integers (indicating a squashable
+ * list), or -1.
+ */
+ Assert(len > -1 || len == -1);
+ jstate->clocations[jstate->clocations_count].length = len;
+ jstate->clocations[jstate->clocations_count].squashed = (len > -1);
jstate->clocations_count++;
}
}
@@ -408,12 +415,12 @@ RecordConstLocation(JumbleState *jstate, int location, bool squashed)
* deduce that the expression is a constant:
*
* - Ignore a possible wrapping RelabelType and CoerceViaIO.
- * - If it's a FuncExpr, check that the function is an implicit
+ * - If it's a FuncExpr, check that the function is a builtin
* cast and its arguments are Const.
* - Otherwise test if the expression is a simple Const.
*/
static bool
-IsSquashableConst(Node *element)
+IsSquashableConstant(Node *element)
{
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -421,32 +428,50 @@ IsSquashableConst(Node *element)
if (IsA(element, CoerceViaIO))
element = (Node *) ((CoerceViaIO *) element)->arg;
- if (IsA(element, FuncExpr))
+ switch (nodeTag(element))
{
- FuncExpr *func = (FuncExpr *) element;
- ListCell *temp;
+ case T_FuncExpr:
+ {
+ FuncExpr *func = (FuncExpr *) element;
+ ListCell *temp;
- if (func->funcformat != COERCE_IMPLICIT_CAST &&
- func->funcformat != COERCE_EXPLICIT_CAST)
- return false;
+ if (func->funcformat != COERCE_IMPLICIT_CAST &&
+ func->funcformat != COERCE_EXPLICIT_CAST)
+ return false;
- if (func->funcid > FirstGenbkiObjectId)
- return false;
+ if (func->funcid > FirstGenbkiObjectId)
+ return false;
- foreach(temp, func->args)
- {
- Node *arg = lfirst(temp);
+ /*
+ * We can check function arguments recursively, being careful
+ * about recursing too deep. At each recursion level it's
+ * enough to test the stack on the first element. (Note that
+ * I wasn't able to hit this without bloating the stack
+ * artificially in this function: the parser errors out before
+ * stack size becomes a problem here.)
+ */
+ foreach(temp, func->args)
+ {
+ Node *arg = lfirst(temp);
- if (!IsA(arg, Const)) /* XXX we could recurse here instead */
+ if (!IsA(arg, Const))
+ {
+ if (foreach_current_index(temp) == 0 &&
+ stack_is_too_deep())
+ return false;
+ else if (!IsSquashableConstant(arg))
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ default:
+ if (!IsA(element, Const))
return false;
- }
-
- return true;
}
- if (!IsA(element, Const))
- return false;
-
return true;
}
@@ -461,35 +486,29 @@ IsSquashableConst(Node *element)
* expressions.
*/
static bool
-IsSquashableConstList(List *elements, Node **firstExpr, Node **lastExpr)
+IsSquashableConstantList(List *elements)
{
ListCell *temp;
- /*
- * If squashing is disabled, or the list is too short, we don't try to
- * squash it.
- */
+ /* If the list is too short, we don't try to squash it. */
if (list_length(elements) < 2)
return false;
foreach(temp, elements)
{
- if (!IsSquashableConst(lfirst(temp)))
+ if (!IsSquashableConstant(lfirst(temp)))
return false;
}
- *firstExpr = linitial(elements);
- *lastExpr = llast(elements);
-
return true;
}
#define JUMBLE_NODE(item) \
_jumbleNode(jstate, (Node *) expr->item)
-#define JUMBLE_ELEMENTS(list) \
- _jumbleElements(jstate, (List *) expr->list)
+#define JUMBLE_ELEMENTS(list, node) \
+ _jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, false)
+ RecordConstLocation(jstate, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -517,36 +536,36 @@ do { \
#include "queryjumblefuncs.funcs.c"
/*
- * We jumble lists of constant elements as one individual item regardless
- * of how many elements are in the list. This means different queries
- * jumble to the same query_id, if the only difference is the number of
- * elements in the list.
+ * We try to jumble lists of expressions as one individual item regardless
+ * of how many elements are in the list. This is know as squashing, which
+ * results in different queries jumbling to the same query_id, if the only
+ * difference is the number of elements in the list.
+ *
+ * We allow constants to be squashed. To normalize such queries, we use
+ * the start and end locations of the list of elements in a list.
*/
static void
-_jumbleElements(JumbleState *jstate, List *elements)
+_jumbleElements(JumbleState *jstate, List *elements, Node *node)
{
- Node *first,
- *last;
+ bool normalize_list = false;
- if (IsSquashableConstList(elements, &first, &last))
+ if (IsSquashableConstantList(elements))
{
- /*
- * If this list of elements is squashable, keep track of the location
- * of its first and last elements. When reading back the locations
- * array, we'll see two consecutive locations with ->squashed set to
- * true, indicating the location of initial and final elements of this
- * list.
- *
- * For the limited set of cases we support now (implicit coerce via
- * FuncExpr, Const) it's fine to use exprLocation of the 'last'
- * expression, but if more complex composite expressions are to be
- * supported (e.g., OpExpr or FuncExpr as an explicit call), more
- * sophisticated tracking will be needed.
- */
- RecordConstLocation(jstate, exprLocation(first), true);
- RecordConstLocation(jstate, exprLocation(last), true);
+ if (IsA(node, ArrayExpr))
+ {
+ ArrayExpr *aexpr = (ArrayExpr *) node;
+
+ if (aexpr->list_start > 0 && aexpr->list_end > 0)
+ {
+ RecordConstLocation(jstate,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ normalize_list = true;
+ }
+ }
}
- else
+
+ if (!normalize_list)
{
_jumbleNode(jstate, (Node *) elements);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..41f03e77149 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -184,7 +184,7 @@ static void doNegateFloat(Float *v);
static Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeOrExpr(Node *lexpr, Node *rexpr, int location);
static Node *makeNotExpr(Node *expr, int location);
-static Node *makeAArrayExpr(List *elements, int location);
+static Node *makeAArrayExpr(List *elements, int location, int end_location);
static Node *makeSQLValueFunction(SQLValueFunctionOp op, int32 typmod,
int location);
static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
@@ -523,7 +523,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> def_elem reloption_elem old_aggr_elem operator_def_elem
%type <node> def_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
- columnref in_expr having_clause func_table xmltable array_expr
+ columnref having_clause func_table xmltable array_expr
OptWhereClause operator_def_arg
%type <list> opt_column_and_period_list
%type <list> rowsfrom_item rowsfrom_list opt_col_def_list
@@ -15287,49 +15287,50 @@ a_expr: c_expr { $$ = $1; }
(Node *) list_make2($5, $7),
@2);
}
- | a_expr IN_P in_expr
+ | a_expr IN_P select_with_parens
{
- /* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($3, SubLink))
- {
- /* generate foo = ANY (subquery) */
- SubLink *n = (SubLink *) $3;
+ /* generate foo = ANY (subquery) */
+ SubLink *n = makeNode(SubLink);
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- $$ = (Node *) n;
- }
- else
- {
- /* generate scalar IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
- }
+ n->subselect = $3;
+ n->subLinkType = ANY_SUBLINK;
+ n->subLinkId = 0;
+ n->testexpr = $1;
+ n->operName = NIL; /* show it's IN not = ANY */
+ n->location = @2;
+ $$ = (Node *) n;
}
- | a_expr NOT_LA IN_P in_expr %prec NOT_LA
+ | a_expr IN_P '(' expr_list ')'
{
- /* in_expr returns a SubLink or a list of a_exprs */
- if (IsA($4, SubLink))
- {
- /* generate NOT (foo = ANY (subquery)) */
- /* Make an = ANY node */
- SubLink *n = (SubLink *) $4;
+ /* generate scalar IN expression */
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "=", $1, (Node *) $4, @2);
- n->subLinkType = ANY_SUBLINK;
- n->subLinkId = 0;
- n->testexpr = $1;
- n->operName = NIL; /* show it's IN not = ANY */
- n->location = @2;
- /* Stick a NOT on top; must have same parse location */
- $$ = makeNotExpr((Node *) n, @2);
- }
- else
- {
- /* generate scalar NOT IN expression */
- $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2);
- }
+ n->rexpr_list_start = @3;
+ n->rexpr_list_end = @5;
+ $$ = (Node *) n;
+ }
+ | a_expr NOT_LA IN_P select_with_parens %prec NOT_LA
+ {
+ /* generate NOT (foo = ANY (subquery)) */
+ SubLink *n = makeNode(SubLink);
+
+ n->subselect = $4;
+ n->subLinkType = ANY_SUBLINK;
+ n->subLinkId = 0;
+ n->testexpr = $1;
+ n->operName = NIL; /* show it's IN not = ANY */
+ n->location = @2;
+ /* Stick a NOT on top; must have same parse location */
+ $$ = makeNotExpr((Node *) n, @2);
+ }
+ | a_expr NOT_LA IN_P '(' expr_list ')'
+ {
+ /* generate scalar NOT IN expression */
+ A_Expr *n = makeSimpleA_Expr(AEXPR_IN, "<>", $1, (Node *) $5, @2);
+
+ n->rexpr_list_start = @4;
+ n->rexpr_list_end = @6;
+ $$ = (Node *) n;
}
| a_expr subquery_Op sub_type select_with_parens %prec Op
{
@@ -16764,15 +16765,15 @@ type_list: Typename { $$ = list_make1($1); }
array_expr: '[' expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' array_expr_list ']'
{
- $$ = makeAArrayExpr($2, @1);
+ $$ = makeAArrayExpr($2, @1, @3);
}
| '[' ']'
{
- $$ = makeAArrayExpr(NIL, @1);
+ $$ = makeAArrayExpr(NIL, @1, @2);
}
;
@@ -16894,17 +16895,6 @@ trim_list: a_expr FROM expr_list { $$ = lappend($3, $1); }
| expr_list { $$ = $1; }
;
-in_expr: select_with_parens
- {
- SubLink *n = makeNode(SubLink);
-
- n->subselect = $1;
- /* other fields will be filled later */
- $$ = (Node *) n;
- }
- | '(' expr_list ')' { $$ = (Node *) $2; }
- ;
-
/*
* Define SQL-style CASE clause.
* - Full specification
@@ -19300,12 +19290,14 @@ makeNotExpr(Node *expr, int location)
}
static Node *
-makeAArrayExpr(List *elements, int location)
+makeAArrayExpr(List *elements, int location, int location_end)
{
A_ArrayExpr *n = makeNode(A_ArrayExpr);
n->elements = elements;
n->location = location;
+ n->list_start = location;
+ n->list_end = location_end;
return (Node *) n;
}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 1f8e2d54673..d66276801c6 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1223,6 +1223,8 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
newa->element_typeid = scalar_type;
newa->elements = aexprs;
newa->multidims = false;
+ newa->list_start = a->rexpr_list_start;
+ newa->list_end = a->rexpr_list_end;
newa->location = -1;
result = (Node *) make_scalar_array_op(pstate,
@@ -2165,6 +2167,8 @@ transformArrayExpr(ParseState *pstate, A_ArrayExpr *a,
/* array_collid will be set by parse_collate.c */
newa->element_typeid = element_type;
newa->elements = newcoercedelems;
+ newa->list_start = a->list_start;
+ newa->list_end = a->list_end;
newa->location = a->location;
return (Node *) newa;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dd00ab420b8..71a9768fe2f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -351,6 +351,14 @@ typedef struct A_Expr
List *name; /* possibly-qualified name of operator */
Node *lexpr; /* left argument, or NULL if none */
Node *rexpr; /* right argument, or NULL if none */
+
+ /*
+ * If rexpr is a list of some kind, we separately track its starting and
+ * ending location; it's not the same as the starting and ending location
+ * of the token itself.
+ */
+ ParseLoc rexpr_list_start;
+ ParseLoc rexpr_list_end;
ParseLoc location; /* token location, or -1 if unknown */
} A_Expr;
@@ -506,6 +514,8 @@ typedef struct A_ArrayExpr
{
NodeTag type;
List *elements; /* array element expressions */
+ ParseLoc list_start; /* start of the element list */
+ ParseLoc list_end; /* end of the elements list */
ParseLoc location; /* token location, or -1 if unknown */
} A_ArrayExpr;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7d3b4198f26..01510b01b64 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1397,6 +1397,10 @@ typedef struct ArrayExpr
List *elements pg_node_attr(query_jumble_squash);
/* true if elements are sub-arrays */
bool multidims pg_node_attr(query_jumble_ignore);
+ /* location of the start of the elements list */
+ ParseLoc list_start;
+ /* location of the end of the elements list */
+ ParseLoc list_end;
/* token location, or -1 if unknown */
ParseLoc location;
} ArrayExpr;
--
2.39.5
I've spent a bunch of time looking at this series and here's my take on
the second one.
Thanks!
I realized that the whole in_expr production in gram.y is pointless, and
the whole private struct that was added was unnecessary. A much simpler
solution is to remove in_expr, expand its use in a_expr to the two
possibilities, and with that we can remove the need for a new struct.
Nice simplification.
I also added a recursive call in IsSquashableExpression to itself. The
I agree with this. I was thinking about a follow-up patch for this based on
the discussion above, but why not just add it now.
Barring objections, I'll push this soon, then look at rebasing 0003 on
top, which I expect to be an easy job.
LGTM.
--
Sami
On Mon, Jun 09, 2025 at 12:44:59PM +0200, Alvaro Herrera wrote:
I also added a recursive call in IsSquashableExpression to itself. The
check for stack depth can be done without throwing an error. I tested
this by adding stack bloat in that function. I also renamed it to
IsSquashableConstant. This changes one of the tests, because a cast
sequence like 42::int::bigint::int is considered squashable.Other than that, the changes are cosmetic.
Barring objections, I'll push this soon, then look at rebasing 0003 on
top, which I expect to be an easy job.
v9-0002 is failing in the CI for the freebsd task:
https://github.com/michaelpq/postgres/runs/43784034162
Here is the link to the diffs, also attached to this message:
https://api.cirrus-ci.com/v1/artifact/task/5378459897167872/testrun/build/testrun/pg_stat_statements/regress/regression.diffs
I am also able to reproduce these failures locally, FWIW. For
example, with a IN clause made of integer constants gets converted to
an ArrayExpr, but in _jumbleElements() we fail to call
RecordConstLocation() and the list is not squashed.
I think that this is can be reproduced by
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST that I always include in my builds.
The freebsd task uses the same with debug_copy_parse_plan_trees=on,
debug_write_read_parse_plan_trees=on and
debug_raw_expression_coverage_test=on.
--
Michael
Attachments:
regression.diffstext/plain; charset=us-asciiDownload
diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/select.out /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/select.out
--- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/select.out 2025-06-10 06:13:14.867518000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/select.out 2025-06-10 06:16:54.005179000 +0000
@@ -292,10 +292,11 @@
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
------------------------------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT WHERE $1 = ANY (ARRAY[$2 /*, ... */]) | 1
+ SELECT WHERE $1 IN ($2, int4($3), int4($4)) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
-(3 rows)
+(4 rows)
--
-- queries with locking clauses
diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/dml.out /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/dml.out
--- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/dml.out 2025-06-10 06:13:14.866052000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/dml.out 2025-06-10 06:16:54.106500000 +0000
@@ -80,7 +80,7 @@
1 | 10 | INSERT INTO pgss_dml_tab VALUES(generate_series($1, $2), $3)
1 | 12 | SELECT * FROM pgss_dml_tab ORDER BY a
2 | 4 | SELECT * FROM pgss_dml_tab WHERE a > $1 ORDER BY a
- 1 | 8 | SELECT * FROM pgss_dml_tab WHERE a IN ($1 /*, ... */)
+ 1 | 8 | SELECT * FROM pgss_dml_tab WHERE a IN ($1, $2, $3, $4, $5)
1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
1 | 0 | SET pg_stat_statements.track_utility = $1
6 | 6 | UPDATE pgss_dml_tab SET b = $1 WHERE a = $2
diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/squashing.out /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/squashing.out
--- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/squashing.out 2025-06-10 06:13:14.867938000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/pg_stat_statements/regress/results/squashing.out 2025-06-10 06:17:00.844056000 +0000
@@ -73,12 +73,14 @@
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 3
- SELECT ARRAY[$1 /*, ... */] | 3
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(3 rows)
+ query | calls
+------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3) | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4) | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
+ SELECT ARRAY[$1 /*, ... */] | 3
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
-- built-in functions will be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
@@ -99,9 +101,10 @@
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
----------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 2
+ SELECT WHERE $1 = ANY (ARRAY[$2 /*, ... */]) | 1
+ SELECT WHERE $1 IN ($2, int4($3), int4($4), $5) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+(3 rows)
-- external parameters will not be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -200,11 +203,14 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) AND data = $2 | 6
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+-----------------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id = ANY (ARRAY[$1 /*, ... */]) AND data = $2 | 3
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) AND data = $10 | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) AND data = $11 | 1
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) AND data = $12 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
-- Multiple squashed intervals
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -250,12 +256,18 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1 /*, ... */)+| 6
- AND data IN ($2 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id = ANY (ARRAY[$1 /*, ... */]) +| 3
+ AND data = ANY (ARRAY[$2 /*, ... */]) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) +| 1
+ AND data IN ($10, $11, $12, $13, $14, $15, $16, $17, $18) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) +| 1
+ AND data IN ($11, $12, $13, $14, $15, $16, $17, $18, $19, $20) |
+ SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)+| 1
+ AND data IN ($12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(5 rows)
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -372,9 +384,14 @@
query | calls
--------------------------------------------------------------------+-------
SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 3
- SELECT data FROM test_float WHERE data IN ($1 /*, ... */) | 7
+ SELECT data FROM test_float WHERE data = ANY(ARRAY[$1 /*, ... */]) | 2
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
+ SELECT data FROM test_float WHERE data IN ($1, $2) | 1
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(3 rows)
+(8 rows)
-- Numeric type, implicit cast is squashed
CREATE TABLE test_squash_numeric (id int, data numeric(5, 2));
@@ -395,11 +412,11 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
- SELECT * FROM test_squash_numeric WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+------------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_numeric WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_numeric WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- Bigint, implicit cast is squashed
@@ -421,11 +438,11 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
--------------------------------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
- SELECT * FROM test_squash_bigint WHERE data IN ($1 /*, ... */) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-----------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- Bigint, explicit cast is squashed
@@ -450,12 +467,14 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_bigint WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+----------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_bigint WHERE data = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT * FROM test_squash_bigint WHERE data IN +| 1
+ ($1::bigint, $2::bigint, $3::bigint, $4::bigint, $5::bigint, $6::bigint,+|
+ $7::bigint, $8::bigint, $9::bigint, $10::bigint, $11::bigint) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Bigint, long tokens with parenthesis, will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -503,11 +522,12 @@
(1 row)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT WHERE $1 IN ($2 /*, ... */) | 2
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+-----------------------------------------------------------------+-------
+ SELECT WHERE $1 = ANY(ARRAY[$2 /*, ... */]) | 1
+ SELECT WHERE $1 IN ($2::int::bigint::int, $3::int::bigint::int) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
--
-- CoerceViaIO
@@ -569,12 +589,17 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_cast WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+----------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash_cast WHERE data = ANY (ARRAY +| 1
+ [$1 /*, ... */]) |
+ SELECT * FROM test_squash_cast WHERE data IN +| 1
+ ($1::int4::casttesttype, $2::int4::casttesttype, $3::int4::casttesttype,+|
+ $4::int4::casttesttype, $5::int4::casttesttype, $6::int4::casttesttype,+|
+ $7::int4::casttesttype, $8::int4::casttesttype, $9::int4::casttesttype,+|
+ $10::int4::casttesttype, $11::int4::casttesttype) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- Some casting expression are simplified to Const
CREATE TABLE test_squash_jsonb (id int, data jsonb);
@@ -601,12 +626,16 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-----------------------------------------------------+-------
- SELECT * FROM test_squash_jsonb WHERE data IN +| 2
- ($1 /*, ... */) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(2 rows)
+ query | calls
+--------------------------------------------------------------+-------
+ SELECT * FROM test_squash_jsonb WHERE data = ANY (ARRAY +| 1
+ [$1 /*, ... */]) |
+ SELECT * FROM test_squash_jsonb WHERE data IN +| 1
+ (($1)::jsonb, ($2)::jsonb, ($3)::jsonb, ($4)::jsonb,+|
+ ($5)::jsonb, ($6)::jsonb, ($7)::jsonb, ($8)::jsonb,+|
+ ($9)::jsonb, ($10)::jsonb) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
-- CoerceViaIO, SubLink instead of a Const. Will not squash
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
@@ -701,13 +730,13 @@
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN +| 1
- ($1 /*, ... */) |
- SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
- SELECT ARRAY[$1 /*, ... */] | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN +| 1
+ ($1::oid, $2::oid, $3::oid, $4::oid, $5::oid, $6::oid, $7::oid, $8::oid, $9::oid) |
+ SELECT * FROM test_squash WHERE id IN ($1::oid, $2::oid::int::oid) | 2
+ SELECT ARRAY[$1 /*, ... */] | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(4 rows)
--
@@ -773,7 +802,7 @@
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */) | 1
+ select where $1 IN ($2::int, $3::int) | 1
select where $1 IN ($2::int, $3::int::text) | 1
(3 rows)
@@ -797,8 +826,9 @@
query | calls
----------------------------------------------------+-------
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
- select where $1 IN ($2 /*, ... */) | 2
-(2 rows)
+ select where $1 = ANY (array[$2 /*, ... */]) | 1
+ select where $1 IN ($2::int::text, $3::int::text) | 1
+(3 rows)
--
-- cleanup
On 2025-Jun-10, Michael Paquier wrote:
I think that this is can be reproduced by
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST that I always include in my builds.
The freebsd task uses the same with debug_copy_parse_plan_trees=on,
debug_write_read_parse_plan_trees=on and
debug_raw_expression_coverage_test=on.
Ah, of course, I forgot to add read/write support for A_Expr. Pushed a
new one, running at
https://cirrus-ci.com/build/6249249819590656
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)
On Tue, Jun 10, 2025 at 07:25:27PM +0200, Alvaro Herrera wrote:
On 2025-Jun-10, Michael Paquier wrote:
I think that this is can be reproduced by
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST that I always include in my builds.
The freebsd task uses the same with debug_copy_parse_plan_trees=on,
debug_write_read_parse_plan_trees=on and
debug_raw_expression_coverage_test=on.Ah, of course, I forgot to add read/write support for A_Expr. Pushed a
new one, running at
https://cirrus-ci.com/build/6249249819590656
Ah, right. I completely forgot that we have a custom read/write
function for this node. Yes, things should be OK once the function is
updated. Thanks.
--
Michael
Hello,
I have pushed that now, and here's a rebase of patch 0003 to add support
for PARAM_EXTERN. I'm not really sure about this one yet ...
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"
Attachments:
v10-0003-Support-Squashing-of-External-Parameters.patchtext/x-diff; charset=utf-8Download
From 0a836189a2e3af3beeb7e3c55d7d0e4ce99b4e8e Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Fri, 30 May 2025 11:42:56 -0500
Subject: [PATCH v10] Support Squashing of External Parameters
62d712ec introduced the concept of element squashing for
quwry normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
---
.../pg_stat_statements/expected/extended.out | 36 ++++++---
.../pg_stat_statements/expected/squashing.out | 26 +++---
.../pg_stat_statements/pg_stat_statements.c | 4 +
contrib/pg_stat_statements/sql/extended.sql | 5 +-
contrib/pg_stat_statements/sql/squashing.sql | 4 +-
src/backend/nodes/queryjumblefuncs.c | 80 ++++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 16 +++-
8 files changed, 118 insertions(+), 59 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out
index 7da308ba84f..6f2c231bf2a 100644
--- a/contrib/pg_stat_statements/expected/extended.out
+++ b/contrib/pg_stat_statements/expected/extended.out
@@ -69,13 +69,13 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
(4 rows)
-- Various parameter numbering patterns
+-- Unique query IDs with parameter numbers switched.
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- Unique query IDs with parameter numbers switched.
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
--
(0 rows)
@@ -96,7 +96,24 @@ SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
--
(0 rows)
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
+ SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(6 rows)
+
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
--
(1 row)
@@ -114,15 +131,10 @@ SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------+-------
- SELECT WHERE $1::int IN ($2::int, $3::int) | 1
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($3::int, $1::int) | 1
- SELECT WHERE $3::int IN ($1::int, $2::int) | 1
- SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
- SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(8 rows)
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b935d464ec..e5dd9337165 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -103,7 +103,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -123,14 +123,14 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
- SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -155,12 +155,12 @@ EXECUTE p1(1, 2, 3, 4, 5);
DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- DEALLOCATE $1 | 2
- PREPARE p1(int, int, int, int, int) AS +| 2
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ PREPARE p1(int, int, int, int, int) AS +| 2
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- More conditions in the query
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index ecc7f2fb266..97b62b635bc 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2841,6 +2841,10 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql
index a366658a53a..ffb5b162819 100644
--- a/contrib/pg_stat_statements/sql/extended.sql
+++ b/contrib/pg_stat_statements/sql/extended.sql
@@ -21,17 +21,18 @@ SELECT $1 \bind 'unnamed_val1' \g
SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Various parameter numbering patterns
-SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-- Unique query IDs with parameter numbers switched.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
SELECT WHERE ($2::int, 10) IN ((11, $3::int), ($1::int, 12)) \bind '1' '2' '3' \g
SELECT WHERE $1::int IN ($2::int, $3::int) \bind '1' '2' '3' \g
SELECT WHERE $2::int IN ($3::int, $1::int) \bind '1' '2' '3' \g
SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
SELECT WHERE '4'::int IN ($1::int, '5'::int) \bind '2' \g
SELECT WHERE $2::int IN ($1::int, '1'::int) \bind '1' '2' \g
SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
-
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index bd3243ec9cd..1e36708778a 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -32,7 +32,7 @@ SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
;
@@ -40,7 +40,7 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
PREPARE p1(int, int, int, int, int) AS
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index fb33e6931ad..0f81a08704d 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,6 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
+ bool extern_param,
int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
@@ -70,6 +71,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +187,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -207,6 +210,10 @@ DoJumble(JumbleState *jstate, Node *node)
if (jstate->pending_nulls > 0)
FlushPendingNulls(jstate);
+ /* Squashed list found, reset highest_extern_param_id */
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Process the jumble buffer and produce the hash value */
return DatumGetInt64(hash_any_extended(jstate->jumble,
jstate->jumble_len,
@@ -376,14 +383,14 @@ FlushPendingNulls(JumbleState *jstate)
* Record the location of some kind of constant within a query string.
* These are not only bare constants but also expressions that ultimately
* constitute a constant, such as those inside casts and simple function
- * calls.
+ * calls; if extern_param, then it corresponds to a PARAM_EXTERN Param.
*
* If length is -1, it indicates a single such constant element. If
* it's a positive integer, it indicates the length of a squashable
* list of them.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, int len)
+RecordConstLocation(JumbleState *jstate, bool extern_param, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -406,6 +413,7 @@ RecordConstLocation(JumbleState *jstate, int location, int len)
Assert(len > -1 || len == -1);
jstate->clocations[jstate->clocations_count].length = len;
jstate->clocations[jstate->clocations_count].squashed = (len > -1);
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
@@ -422,6 +430,7 @@ RecordConstLocation(JumbleState *jstate, int location, int len)
static bool
IsSquashableConstant(Node *element)
{
+ /* Unwrap RelabelType and CoerceViaIO layers */
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -430,6 +439,12 @@ IsSquashableConstant(Node *element)
switch (nodeTag(element))
{
+ case T_Param:
+ return castNode(Param, element)->paramkind == PARAM_EXTERN;
+
+ case T_Const:
+ return true;
+
case T_FuncExpr:
{
FuncExpr *func = (FuncExpr *) element;
@@ -468,11 +483,8 @@ IsSquashableConstant(Node *element)
}
default:
- if (!IsA(element, Const))
- return false;
+ return false;
}
-
- return true;
}
/*
@@ -508,7 +520,7 @@ IsSquashableConstantList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, -1)
+ RecordConstLocation(jstate, false, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -558,9 +570,11 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
if (aexpr->list_start > 0 && aexpr->list_end > 0)
{
RecordConstLocation(jstate,
+ false,
aexpr->list_start + 1,
(aexpr->list_end - aexpr->list_start) - 1);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -612,26 +626,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -734,3 +728,35 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+/*
+ * Custom query jumble function for _jumbleParam.
+ */
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ /*
+ * At this point, only external parameter locations outside of
+ * squashable lists will be recorded.
+ */
+ RecordConstLocation(jstate, true, expr->location, -1);
+
+ /*
+ * Update the highest Param id seen, in order to start normalization
+ * correctly.
+ *
+ * Note: This value is reset at the end of jumbling if there exists a
+ * squashable list. See the comment in the definition of JumbleState.
+ */
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 01510b01b64..6dfca3cb35b 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e..bab971162dc 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,13 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+
+ /*
+ * Indicates whether a location is that of an external parameter, so it
+ * can be decided during normalization whether the parameter number should
+ * be replaced or kept as is.
+ */
+ bool extern_param;
} LocationLen;
/*
@@ -52,8 +59,15 @@ typedef struct JumbleState
/* Current number of valid entries in clocations array */
int clocations_count;
- /* highest Param id we've seen, in order to start normalization correctly */
+ /*
+ * Highest Param id we've seen, in order to start normalization correctly.
+ * However, if the jumble contains at least one squashed list, we
+ * disregard the highest_extern_param_id value because parameters can
+ * exist within the squashed list and are no longer considered for
+ * normalization.
+ */
int highest_extern_param_id;
+ bool has_squashed_lists;
/*
* Count of the number of NULL nodes seen since last appending a value.
--
2.39.5
On Thu, Jun 12, 2025 at 11:32 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
Hello,
I have pushed that now,
thanks!
and here's a rebase of patch 0003 to add support
for PARAM_EXTERN. I'm not really sure about this one yet ...
see v11. I added a missing test to show how external param
normalization behaves without a squashed list vs with.
Also, improved some of the code comments for additional
clarity.
--
Sami
Attachments:
v11-0001-Support-Squashing-of-External-Parameters.patchapplication/octet-stream; name=v11-0001-Support-Squashing-of-External-Parameters.patchDownload
From fdfd763cb8121d7deee1fdd2b55b099c2619e120 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Fri, 30 May 2025 11:42:56 -0500
Subject: [PATCH v11 1/1] Support Squashing of External Parameters
62d712ec introduced the concept of element squashing for
query normalization purposes. However, it did not account for
external parameters passed to a list of elements. This adds
support to these types of values and simplifies the squashing
logic further.
---
.../pg_stat_statements/expected/extended.out | 60 +++++++++++---
.../pg_stat_statements/expected/squashing.out | 26 +++---
.../pg_stat_statements/pg_stat_statements.c | 8 ++
contrib/pg_stat_statements/sql/extended.sql | 11 ++-
contrib/pg_stat_statements/sql/squashing.sql | 4 +-
src/backend/nodes/queryjumblefuncs.c | 81 ++++++++++++-------
src/include/nodes/primnodes.h | 6 +-
src/include/nodes/queryjumble.h | 15 +++-
8 files changed, 152 insertions(+), 59 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out
index 7da308ba84f4..eec68195b87a 100644
--- a/contrib/pg_stat_statements/expected/extended.out
+++ b/contrib/pg_stat_statements/expected/extended.out
@@ -69,13 +69,13 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
(4 rows)
-- Various parameter numbering patterns
+-- Unique query IDs with parameter numbers switched.
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
t
(1 row)
--- Unique query IDs with parameter numbers switched.
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
--
(0 rows)
@@ -96,7 +96,24 @@ SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
--
(0 rows)
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 1
+ SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
+ SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(6 rows)
+
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
--
(1 row)
@@ -114,15 +131,34 @@ SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
---------------------------------------------------------------+-------
- SELECT WHERE $1::int IN ($2::int, $3::int) | 1
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($1::int, $3::int) | 2
- SELECT WHERE $2::int IN ($3::int, $1::int) | 1
- SELECT WHERE $3::int IN ($1::int, $2::int) | 1
- SELECT WHERE ($1::int, $4) IN (($5, $2::int), ($3::int, $6)) | 1
- SELECT WHERE ($2::int, $4) IN (($5, $3::int), ($1::int, $6)) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
-(8 rows)
+ query | calls
+----------------------------------------------------+-------
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT WHERE $1::int IN ($2 /*, ... */) | 2
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- not squashable list, the parameters id's will not be kept as-is
+SELECT WHERE $3 = $1 AND $2 = $4 \bind 1 2 1 2 \g
+--
+(1 row)
+
+-- squashable list, so the parameter IDs will be re-assigned
+SELECT WHERE 1 IN (1, 2, 3) AND $3 = $1 AND $2 = $4 \bind 1 2 1 2 \g
+--
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+------------------------------------------------------------+-------
+ SELECT WHERE $1 IN ($2 /*, ... */) AND $3 = $4 AND $5 = $6 | 1
+ SELECT WHERE $3 = $1 AND $2 = $4 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index 7b935d464ecf..e5dd9337165b 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -103,7 +103,7 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(2 rows)
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
@@ -123,14 +123,14 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
(0 rows)
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
----------------------------------------------------------------------------+-------
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) | 1
- SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) | 1
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+----------------------------------------------------------------------+-------
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) | 1
+ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1 /*, ... */]) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
@@ -155,12 +155,12 @@ EXECUTE p1(1, 2, 3, 4, 5);
DEALLOCATE p1;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls
-------------------------------------------------------------+-------
- DEALLOCATE $1 | 2
- PREPARE p1(int, int, int, int, int) AS +| 2
- SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) |
- SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ query | calls
+-------------------------------------------------------+-------
+ DEALLOCATE $1 | 2
+ PREPARE p1(int, int, int, int, int) AS +| 2
+ SELECT * FROM test_squash WHERE id IN ($1 /*, ... */) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
(3 rows)
-- More conditions in the query
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index ecc7f2fb2663..9f1dafd157e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2841,6 +2841,14 @@ generate_normalized_query(JumbleState *jstate, const char *query,
int off, /* Offset from start for cur tok */
tok_len; /* Length (in bytes) of that tok */
+ /*
+ * Skip external parameter ID reassignment if there are no squashed
+ * lists.
+ */
+ if (jstate->clocations[i].extern_param &&
+ !jstate->has_squashed_lists)
+ continue;
+
off = jstate->clocations[i].location;
/* Adjust recorded location if we're dealing with partial string */
diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql
index a366658a53a7..cf4cbf7238cb 100644
--- a/contrib/pg_stat_statements/sql/extended.sql
+++ b/contrib/pg_stat_statements/sql/extended.sql
@@ -21,17 +21,24 @@ SELECT $1 \bind 'unnamed_val1' \g
SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Various parameter numbering patterns
-SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-- Unique query IDs with parameter numbers switched.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE ($1::int, 7) IN ((8, $2::int), ($3::int, 9)) \bind '1' '2' '3' \g
SELECT WHERE ($2::int, 10) IN ((11, $3::int), ($1::int, 12)) \bind '1' '2' '3' \g
SELECT WHERE $1::int IN ($2::int, $3::int) \bind '1' '2' '3' \g
SELECT WHERE $2::int IN ($3::int, $1::int) \bind '1' '2' '3' \g
SELECT WHERE $3::int IN ($1::int, $2::int) \bind '1' '2' '3' \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
-- Two groups of two queries with the same query ID.
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT WHERE '1'::int IN ($1::int, '2'::int) \bind '1' \g
SELECT WHERE '4'::int IN ($1::int, '5'::int) \bind '2' \g
SELECT WHERE $2::int IN ($1::int, '1'::int) \bind '1' '2' \g
SELECT WHERE $2::int IN ($1::int, '2'::int) \bind '3' '4' \g
-
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+-- not squashable list, the parameters id's will not be kept as-is
+SELECT WHERE $3 = $1 AND $2 = $4 \bind 1 2 1 2 \g
+-- squashable list, so the parameter IDs will be re-assigned
+SELECT WHERE 1 IN (1, 2, 3) AND $3 = $1 AND $2 = $4 \bind 1 2 1 2 \g
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index bd3243ec9cd8..1e36708778a9 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -32,7 +32,7 @@ SELECT WHERE 1 IN (1, int4(1), int4(2), 2);
SELECT WHERE 1 = ANY (ARRAY[1, int4(1), int4(2), 2]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- external parameters will not be squashed
+-- external parameters will be squashed
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT * FROM test_squash WHERE id IN ($1, $2, $3, $4, $5) \bind 1 2 3 4 5
;
@@ -40,7 +40,7 @@ SELECT * FROM test_squash WHERE id::text = ANY(ARRAY[$1, $2, $3, $4, $5]) \bind
;
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
--- neither are prepared statements
+-- prepared statements will also be squashed
-- the IN and ARRAY forms of this statement will have the same queryId
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
PREPARE p1(int, int, int, int, int) AS
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index fb33e6931ada..965144d0a43b 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -61,6 +61,7 @@ static void AppendJumble(JumbleState *jstate,
const unsigned char *value, Size size);
static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
+ bool extern_param,
int location, int len);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
@@ -70,6 +71,7 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static void _jumbleParam(JumbleState *jstate, Node *node);
/*
* Given a possibly multi-statement source string, confine our attention to the
@@ -185,6 +187,7 @@ InitJumble(void)
jstate->clocations_count = 0;
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
+ jstate->has_squashed_lists = false;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -207,6 +210,10 @@ DoJumble(JumbleState *jstate, Node *node)
if (jstate->pending_nulls > 0)
FlushPendingNulls(jstate);
+ /* Squashed list found, reset highest_extern_param_id */
+ if (jstate->has_squashed_lists)
+ jstate->highest_extern_param_id = 0;
+
/* Process the jumble buffer and produce the hash value */
return DatumGetInt64(hash_any_extended(jstate->jumble,
jstate->jumble_len,
@@ -376,14 +383,15 @@ FlushPendingNulls(JumbleState *jstate)
* Record the location of some kind of constant within a query string.
* These are not only bare constants but also expressions that ultimately
* constitute a constant, such as those inside casts and simple function
- * calls.
+ * calls; if extern_param is true, the location is a param of kind
+ * PARAM_EXTERN.
*
* If length is -1, it indicates a single such constant element. If
* it's a positive integer, it indicates the length of a squashable
* list of them.
*/
static void
-RecordConstLocation(JumbleState *jstate, int location, int len)
+RecordConstLocation(JumbleState *jstate, bool extern_param, int location, int len)
{
/* -1 indicates unknown or undefined location */
if (location >= 0)
@@ -406,6 +414,7 @@ RecordConstLocation(JumbleState *jstate, int location, int len)
Assert(len > -1 || len == -1);
jstate->clocations[jstate->clocations_count].length = len;
jstate->clocations[jstate->clocations_count].squashed = (len > -1);
+ jstate->clocations[jstate->clocations_count].extern_param = extern_param;
jstate->clocations_count++;
}
}
@@ -422,6 +431,7 @@ RecordConstLocation(JumbleState *jstate, int location, int len)
static bool
IsSquashableConstant(Node *element)
{
+ /* Unwrap RelabelType and CoerceViaIO layers */
if (IsA(element, RelabelType))
element = (Node *) ((RelabelType *) element)->arg;
@@ -430,6 +440,12 @@ IsSquashableConstant(Node *element)
switch (nodeTag(element))
{
+ case T_Param:
+ return castNode(Param, element)->paramkind == PARAM_EXTERN;
+
+ case T_Const:
+ return true;
+
case T_FuncExpr:
{
FuncExpr *func = (FuncExpr *) element;
@@ -468,11 +484,8 @@ IsSquashableConstant(Node *element)
}
default:
- if (!IsA(element, Const))
- return false;
+ return false;
}
-
- return true;
}
/*
@@ -508,7 +521,7 @@ IsSquashableConstantList(List *elements)
#define JUMBLE_ELEMENTS(list, node) \
_jumbleElements(jstate, (List *) expr->list, node)
#define JUMBLE_LOCATION(location) \
- RecordConstLocation(jstate, expr->location, -1)
+ RecordConstLocation(jstate, false, expr->location, -1)
#define JUMBLE_FIELD(item) \
do { \
if (sizeof(expr->item) == 8) \
@@ -558,9 +571,11 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
if (aexpr->list_start > 0 && aexpr->list_end > 0)
{
RecordConstLocation(jstate,
+ false,
aexpr->list_start + 1,
(aexpr->list_end - aexpr->list_start) - 1);
normalize_list = true;
+ jstate->has_squashed_lists = true;
}
}
}
@@ -612,26 +627,6 @@ _jumbleNode(JumbleState *jstate, Node *node)
break;
}
- /* Special cases to handle outside the automated code */
- switch (nodeTag(expr))
- {
- case T_Param:
- {
- Param *p = (Param *) node;
-
- /*
- * Update the highest Param id seen, in order to start
- * normalization correctly.
- */
- if (p->paramkind == PARAM_EXTERN &&
- p->paramid > jstate->highest_extern_param_id)
- jstate->highest_extern_param_id = p->paramid;
- }
- break;
- default:
- break;
- }
-
/* Ensure we added something to the jumble buffer */
Assert(jstate->total_jumble_len > prev_jumble_len);
}
@@ -734,3 +729,35 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
*/
JUMBLE_STRING(aliasname);
}
+
+/*
+ * Custom query jumble function for _jumbleParam.
+ */
+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
+ Param *expr = (Param *) node;
+
+ JUMBLE_FIELD(paramkind);
+ JUMBLE_FIELD(paramid);
+ JUMBLE_FIELD(paramtype);
+
+ if (expr->paramkind == PARAM_EXTERN)
+ {
+ /*
+ * At this point, only external parameter locations outside of
+ * squashable lists will be recorded.
+ */
+ RecordConstLocation(jstate, true, expr->location, -1);
+
+ /*
+ * Update the highest Param id seen, in order to start normalization
+ * correctly.
+ *
+ * Note: This value is reset at the end of jumbling if there exists a
+ * squashable list. See the comment in the definition of JumbleState.
+ */
+ if (expr->paramid > jstate->highest_extern_param_id)
+ jstate->highest_extern_param_id = expr->paramid;
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 01510b01b649..6dfca3cb35ba 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -389,14 +389,16 @@ typedef enum ParamKind
typedef struct Param
{
+ pg_node_attr(custom_query_jumble)
+
Expr xpr;
ParamKind paramkind; /* kind of parameter. See above */
int paramid; /* numeric ID for parameter */
Oid paramtype; /* pg_type OID of parameter's datatype */
/* typmod value, if known */
- int32 paramtypmod pg_node_attr(query_jumble_ignore);
+ int32 paramtypmod;
/* OID of collation, or InvalidOid if none */
- Oid paramcollid pg_node_attr(query_jumble_ignore);
+ Oid paramcollid;
/* token location, or -1 if unknown */
ParseLoc location;
} Param;
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index da7c7abed2e6..39d94e2dc2db 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -29,6 +29,13 @@ typedef struct LocationLen
* of squashed constants.
*/
bool squashed;
+
+ /*
+ * Indicates whether a location is that of an external parameter, so it
+ * can be decided during normalization whether the parameter number should
+ * be replaced or kept as is.
+ */
+ bool extern_param;
} LocationLen;
/*
@@ -52,8 +59,14 @@ typedef struct JumbleState
/* Current number of valid entries in clocations array */
int clocations_count;
- /* highest Param id we've seen, in order to start normalization correctly */
+ /*
+ * Highest Param ID we've seen, in order to start normalization correctly.
+ * However, if the jumble contains at least one squashed list, we
+ * disregard highest_extern_param_id, because parameters inside squashed
+ * lists are not retained in the normalized query string.
+ */
int highest_extern_param_id;
+ bool has_squashed_lists;
/*
* Count of the number of NULL nodes seen since last appending a value.
--
2.39.5 (Apple Git-154)
Hello
I spent a much longer time staring at this patch than I wanted to, and
at a point I almost wanted to boot the whole thing to pg19, but because
upthread we already had an agreement that we should get it in for this
cycle, I decided that the best course of action was to just move forward
with it.
My reluctance mostly comes from this bit in generate_normalized_query:
+ /*
+ * If we have an external param at this location, but no lists are
+ * being squashed across the query, then we skip here; this will make
+ * us print print the characters found in the original query that
+ * represent the parameter in the next iteration (or after the loop is
+ * done), which is a bit odd but seems to work okay in most cases.
+ */
+ if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
+ continue;
Sami's patch didn't have a comment here and it was not immediately
obvious what was going on; moreover I was quite surprised at what
happened if I removed it: for example, one query text (in test
level_tracking.sql) changes from
- 2 | 2 | SELECT (i + $2)::INTEGER LIMIT $3
into
+ 2 | 2 | SELECT ($2 + $3)::INTEGER LIMIT $4
and if I understand this correctly, the reason is that the query is
being executed from an SQL function,
CREATE FUNCTION PLUS_ONE(i INTEGER) RETURNS INTEGER AS
$$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
so the 'i' is actually a parameter, so it makes sense that it gets
turned into a parameter in the normalized query. With the 'if' test I
mentioned above, we print it as 'i' literally only because we
'continued' and the next time through the loop we print the text from
the original query. This may be considered not entirely correct ...
note that the constants in the query are shown as parameters, and that
the numbers do not start from 1. (Obviously, a few other queries also
change.)
I decided to move forward with it anyway because this weirdness seems
more contained and less damaging than the unhelpfulness of the unpatched
behavior. We may want to revisit this in pg19 -- or, if we're really
unconfortable with this, we could even decide to revert this commit in
pg18 and try again with Param support in pg19. But I'd like to move on
from this and my judgement was that the situation with patch is better
than without.
I added one more commit, which iterates to peel as many layers of
CoerceViaIO and RelabelType as there are around an expression. I
noticed this lack while reading Sami's patch for that function, which
just simplified the coding therein. For normal cases this would iterate
just once (because repeated layers of casts are likely very unusual), so
I think it's okay to do that. There was some discussion about handling
this via recursion but it died inconclusively; I added recursive
handling of FuncExpr's arguments in 0f65f3eec478, which was different,
but I think handling this case by iterating is better than recursing.
With these commits, IMO the open item can now be closed. Even if we
ultimately end up reverting any of this, we would probably punt support
of Params to pg19, so the open item would be gone anyway.
Lastly, I decided not to do a catversion bump. As far as I can tell,
changes in the jumbling functions do not need them. I tried an
'installcheck' run with a datadir initdb'd with the original code, and
it works fine. I also tried an 'installcheck' with pg_stat_statements
installed, and was surprised to realize that the Query Id reports in
EXPLAIN make a large number of tests fail. If I take those lines from
the original code into the expected output, and then run the tests with
the new code, I notice that a few queries have changed queryId. I
suppose this was to be expected, and shouldn't harm anything.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there. Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html
On Tue, Jun 24, 2025 at 07:45:15PM +0200, Alvaro Herrera wrote:
+ /* + * If we have an external param at this location, but no lists are + * being squashed across the query, then we skip here; this will make + * us print print the characters found in the original query that + * represent the parameter in the next iteration (or after the loop is + * done), which is a bit odd but seems to work okay in most cases. + */ + if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists) + continue;
+ * us print print the characters found in the original query that
The final commit includes this comment, with a s/print print/print/
required.
and if I understand this correctly, the reason is that the query is
being executed from an SQL function,CREATE FUNCTION PLUS_ONE(i INTEGER) RETURNS INTEGER AS
$$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
Right, with two executions of PLUS_ONE() making it a single entry with
calls=2 .
so the 'i' is actually a parameter, so it makes sense that it gets
turned into a parameter in the normalized query. With the 'if' test I
mentioned above, we print it as 'i' literally only because we
'continued' and the next time through the loop we print the text from
the original query. This may be considered not entirely correct ...
note that the constants in the query are shown as parameters, and that
the numbers do not start from 1. (Obviously, a few other queries also
change.)
What you have committed is also consistent with the decision in v17
and older branches. The current result looks OK to me for v18.
I added one more commit, which iterates to peel as many layers of
CoerceViaIO and RelabelType as there are around an expression. I
noticed this lack while reading Sami's patch for that function, which
just simplified the coding therein. For normal cases this would iterate
just once (because repeated layers of casts are likely very unusual), so
I think it's okay to do that. There was some discussion about handling
this via recursion but it died inconclusively; I added recursive
handling of FuncExpr's arguments in 0f65f3eec478, which was different,
but I think handling this case by iterating is better than recursing.
Agreed. I was actually wondering about the logic of
IsSquashableConstant() when it came to RelabelType and CoerceViaIO.
The order of the scans was expected but just going back to the top of
IsSquashableConstant() for these two nodes makes the code easier to
follow. So agreed that your change is an improvement.
With these commits, IMO the open item can now be closed. Even if we
ultimately end up reverting any of this, we would probably punt support
of Params to pg19, so the open item would be gone anyway.
Yes.
Lastly, I decided not to do a catversion bump. As far as I can tell,
changes in the jumbling functions do not need them. I tried an
'installcheck' run with a datadir initdb'd with the original code, and
it works fine.
This reminds me of 4c7cd07aa62a and this thread:
/messages/by-id/1364409.1727673407@sss.pgh.pa.us
Doesn't the change in the Param structure actually require one because
it can change the representation of some SQL functions? I am not
completely sure.
I also tried an 'installcheck' with pg_stat_statements
installed, and was surprised to realize that the Query Id reports in
EXPLAIN make a large number of tests fail. If I take those lines from
the original code into the expected output, and then run the tests with
the new code, I notice that a few queries have changed queryId. I
suppose this was to be expected, and shouldn't harm anything.
I don't think we've put a lot of work in trying to make installcheck
work with PGSS (never tried it TBH), so I am not surprised to see some
failures when one tries this mode.
--
Michael
On 2025-Jun-25, Michael Paquier wrote:
On Tue, Jun 24, 2025 at 07:45:15PM +0200, Alvaro Herrera wrote:
+ /* + * If we have an external param at this location, but no lists are + * being squashed across the query, then we skip here; this will make + * us print print the characters found in the original query that + * represent the parameter in the next iteration (or after the loop is + * done), which is a bit odd but seems to work okay in most cases. + */ + if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists) + continue;+ * us print print the characters found in the original query that
The final commit includes this comment, with a s/print print/print/
required.
Ugh. Fixed, thanks for noticing that.
Lastly, I decided not to do a catversion bump. As far as I can tell,
changes in the jumbling functions do not need them. I tried an
'installcheck' run with a datadir initdb'd with the original code, and
it works fine.This reminds me of 4c7cd07aa62a and this thread:
/messages/by-id/1364409.1727673407@sss.pgh.pa.usDoesn't the change in the Param structure actually require one because
it can change the representation of some SQL functions? I am not
completely sure.
Hmm, but the Param structure didn't actually change; only its jumbling
function did (and others that rely on LocationLen). So I think what
could happen here if there's no catversion is that somebody has a
pg_stat_statements populated with query Ids for some queries that have a
different queryIds when computed with the new code. So they're going to
have duplicates in pg_stat_statements. I think this is a pretty minor
problem, so I'm not inclined to do a catversion bump for it.
Anyway we have one due to 0cd69b3d7ef3, so it's moot now. (But it's a
good discussion to have, for the future.)
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Niemand ist mehr Sklave, als der sich für frei hält, ohne es zu sein."
Nadie está tan esclavizado como el que se cree libre no siéndolo
(Johann Wolfgang von Goethe)