queryId constant squashing does not support prepared statements
62d712ec added the ability to squash constants from an IN LIST
for queryId computation purposes. This means that a similar
queryId will be generated for the same queries that only
different on the number of values in the IN-LIST.
The patch lacks the ability to apply this optimization to values
passed in as parameters ( i.e. parameter kind = PARAM_EXTERN )
which will be the case for SQL prepared statements and protocol level
prepared statements, i.e.
```
select from t where id in (1, 2, 3) \bind
```
or
```
prepare prp(int, int, int) as select from t where id in ($1, $2, $3);
```
Here is the current state,
```
postgres=# create table t (id int);
CREATE TABLE
postgres=# prepare prp(int, int, int) as select from t where id in ($1, $2, $3);
PREPARE
postgres=# execute prp(1, 2, 3);
postgres=# select from t where id in (1, 2, 3);
--
(0 rows)
postgres=# SELECT query, calls FROM pg_stat_statements ORDER BY query
COLLATE "C";
query
| calls
-----------------------------------------------------------------------------------------------------------+-------
...
....
select from t where id in ($1 /*, ... */)
| 1
select from t where id in ($1, $2, $3)
| 1 <<- prepared statement
(6 rows)
```
but with the attached patch, the optimization applies.
```
create table t (id int)
| 1
select from t where id in ($1 /*, ... */)
| 2
(3 rows)
```
I think this is a pretty big gap as many of the common drivers such as JDBC,
which use extended query protocol, will not be able to take advantage of
the optimization in 18, which will be very disappointing.
Thoughts?
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v1-0001-Allow-query-jumble-to-squash-a-list-external-para.patchapplication/octet-stream; name=v1-0001-Allow-query-jumble-to-squash-a-list-external-para.patchDownload+60-5
On Wed, Apr 30, 2025 at 04:52:06PM -0500, Sami Imseih wrote:
62d712ec added the ability to squash constants from an IN LIST
for queryId computation purposes. This means that a similar
queryId will be generated for the same queries that only
different on the number of values in the IN-LIST.The patch lacks the ability to apply this optimization to values
passed in as parameters ( i.e. parameter kind = PARAM_EXTERN )
which will be the case for SQL prepared statements and protocol level
prepared statements, i.e.I think this is a pretty big gap as many of the common drivers such as JDBC,
which use extended query protocol, will not be able to take advantage of
the optimization in 18, which will be very disappointing.Thoughts?
Yes. Long IN/ANY clauses are as far as a more common pattern caused
by ORMs, and I'd like to think that application developers would not
hardcode such clauses in their right minds (well, okay, I'm likely
wrong about this assumption, feel free to counter-argue). These also
like relying on the extended query protocol. Not taking into account
JDBC in that is a bummer, and it is very popular.
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.
Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com
--
Michael
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com
In fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.
At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..
Not entirely sure how that would work exactly, but after my experiments
with the squashing patch I found it could be very useful to be able to
identify the end location of an expression list in the parser.
I spent a few hours looking into this today and to your points below:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
I don't think limiting this feature to Const only will suffice.
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
To be honest, the situation of HEAD makes me question whether we are
using the right approach for this facility. I did mention a couple of
months ago about an alternative, but it comes down to accept that any
expressions would be normalized, unfortunately I never got down to
study it in details as this touches around expr_list in the parser: we
could detect in the parser the start and end locations of an
expression list in a query string, then group all of them together
based on their location in the string. This would be also cheaper
than going through all the elements in the list, tweaking things when
dealing with a subquery..Not entirely sure how that would work exactly, but after my experiments
with the squashing patch I found it could be very useful to be able to
identify the end location of an expression list in the parser.
I also came to the same conclusion, that we should track the start '('
and end ')'
location of a expression list to allow us to hide the fields. But, I
will look into
other approaches as well.
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.
--
Sami
On Thu, May 01, 2025 at 03:57:16PM -0500, Sami Imseih wrote:
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
Likely so. I don't have anything else than Sublink in mind that would
be worth a special case..
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.
I am wondering if this would not be the best move to do on HEAD.
Let's see where the discussion drives us.
--
Michael
On Fri, May 02, 2025 at 07:10:19AM GMT, Michael Paquier wrote:
I am really leaning towards that we should revert this feature as the
limitation we have now with parameters is a rather large one and I think
we need to go back and address this issue.I am wondering if this would not be the best move to do on HEAD.
Let's see where the discussion drives us.
Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.
On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote:
Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
--
Michael
On Fri, May 02, 2025 at 04:18:37PM GMT, Michael Paquier wrote:
On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote:Squashing constants was ment to be a first step towards doing the same
for other types of queries (params, rte_values), reverting it to
implement everything at once makes very little sense to me.That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
If I understand you correctly, changing the way how element list is
identified is not going to address the question whether or not to squash
parameters, right?
On 2025-May-02, Michael Paquier wrote:
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning the
facility from the ground, so the old approach would not be really
relevant..
I disagree that a revert is warranted for this reason. If you want to
change the implementation later, that's fine, as long as the user
interface doesn't change.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.
Returning to the topic of whether to squash list of Params.
Originally squashing of Params wasn't included into the squashing patch due to
concerns from reviewers about treating quite different queries as the same for
the purposes of query identification. E.g. there is some assumption somewhere,
which will be broken if we treat query with a list of integer parameters same
as a query with a list of float parameters. For the sake of making progress
I've decided to postpone answering this question and concentrate on more simple
scenario. Now, as the patch was applied, I think it's a good moment to reflect
on those concerns. It's not enough to say that we don't see any problems with
squashing of Param, some more sound argumentation is needed. So, what will
happen if parameters are squashed as constants?
1. One obvious impact is that more queries, that were considered distinct
before, will have the same normalized query and hence the entry in
pg_stat_statements. Since a Param could be pretty much anything, this can lead
to a situation when two queries with quiet different performance profiles (e.g.
one contains a parameter value, which is a heavy function, another one doesn't)
are matched to one entry, making it less useful.
But at the same time this already can happen if those two queries have the same
number of parameters, since query parametrizing is intrinsically lossy in this
sense. The only thing we do by squashing such queries is we loose information
about the number of parameters, not properties of the parameters themselves.
2. Another tricky scenario is when queryId is used by some extension, which in
turn makes assumption about it that are going to be rendered incorrect by
squashing. The only type of assumptions I can imagine falling into this
category is anything about equivalence of queries. For example, an extension
can capture two queries, which have the same normalized entry in pgss, and
assume all properties of those queries are the same.
It's worth noting that normalized query is not transitive, i.e. if a query1 has
the normalized version query_norm, and a query2 has the same normalized version
query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
they could have list of parameter values with different type and the same
size). That means that such assumptions are already faulty, and could work most
of the time only because it takes queries with a list of the same size to break
the assumption. Squashing such queries will make them wrong more often.
One can argue that we might want to be friendly to such extensions, and do not
"break" them even further. But I don't think it's worth it, as number of such
extensions is most likely low, if any. One more extreme case would be when an
extension assumes that queries with the same entry in pgss have the same number
of parameters, but I don't see how such assumption could be useful.
3. More annoying is the consequence that parameters are going to be treated as
constants in pg_stat_statements. While mostly harmless, that would mean they're
going to be replaced in the same way as constants. This means that the
parameter order is going to be lost, e.g.:
SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
AND id IN ($2 /*, ... */)
This representation could be confusing of course. It could be either explained
in the documentation, or LocationLen has to be extended to carry information
about whether it's a constant or a parameter, and do not replace the latter. In
any case, anything more than the first parameter number will be lost, but it's
probably not so dramatic.
At the end of the day, I think the value of squashing for parameters outweighs
the problems described above. As long as there is an agreement about that, it's
fine by me. I've attached the more complete version of the patch (but without
modifying LocationLen to not replace Param yet) in case if such agreemeng will
be achieved.
Attachments:
v2-0001-Allow-query-jumble-to-squash-a-list-of-external-p.patchtext/plain; charset=us-asciiDownload+190-45
On Fri, 2 May 2025 14:56:56 +0200
Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-May-02, Michael Paquier wrote:
That depends. If we conclude that tracking this information through
the parser based on the start and end positions in a query string
for a set of values is more relevant, then we would be redesigning
the facility from the ground, so the old approach would not be
really relevant..I disagree that a revert is warranted for this reason. If you want to
change the implementation later, that's fine, as long as the user
interface doesn't change.
FWIW, i'm +1 on leaving it in pg18. Prepared statements often look a
little different in other ways, and there are a bunch of other quirks
in how queryid's are calculated too. Didn't there used to be something
with CALL being handled as a utility statement making stored procs look
different from functions?
--
To know the thoughts and deeds that have marked man's progress is to
feel the great heart throbs of humanity through the centuries; and if
one does not feel in these pulsations a heavenward striving, one must
indeed be deaf to the harmonies of life.
Helen Keller. Let Us Have Faith. Doubleday, Doran & Company, 1940.
Michael Paquier <michael@paquier.xyz> writes:
On Thu, May 01, 2025 at 03:57:16PM -0500, Sami Imseih wrote:
I think what we should really allow the broader scope of expressions that
are allowed via prepared statements, and this will make this implementation
consistent between prepared vs non-prepared statements. I don't see why
not. In fact, when we are examining the ArrayExpr, I think the only
thing we should
not squash is if we find a Sublink ( i.e. SELECT statement inside the array ).
Likely so. I don't have anything else than Sublink in mind that would
be worth a special case..
I think this is completely wrong. As simple examples, there is
nothing even a little bit comparable between the behaviors of
t1.x IN (1, 2, 3)
t1.x IN (1, 2, t2.y)
t1.x IN (1, 2, random())
Squashing these to look the same would be doing nobody any favors.
I do agree that treating PARAM_EXTERN Params the same as constants
for this purpose is a reasonable thing to do, on three arguments:
1. A PARAM_EXTERN Param actually behaves largely the same as a Const
so far as a query is concerned: it does not change value across
the execution of the query. (This is not true of other kinds of
Params.)
2. It's very much dependent on the client-side stack whether a given
value that is constant in the mind of the application will be passed
to the backend as a Const or a Param. (This is okay because #1.)
3. Even if the value is passed as a Param, the planner might replace
it by a Const by means of generating a custom query plan.
So the boundary between PARAM_EXTERN Params and Consts is actually
mighty squishy, and thus I think it makes sense for pg_stat_statements
to mash them together. But this logic does not extend to Vars or
function calls or much of anything else.
Maybe in the future we could have a discussion about whether
expressions involving only Params, Consts, and immutable functions
(say, "$1 + 1") could be mashed as though they were constants, on
the grounds that they'd have been reduced to a single constant if the
planner had chosen to generate a custom plan. But I think it's too
late to consider that for v18. I'd be okay with the rule "treat any
list of Consts and PARAM_EXTERN Params the same as any other" for v18.
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.
regards, tom lane
Hi Dmitry,
On Sun, May 4, 2025 at 6:19 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
I agree that the current solution we have in the tree feels incomplete
because we are not taking into account the most common cases that
users would care about. Now, allowing PARAM_EXTERN means that we
allow any expression to be detected as a squashable thing, and this
kinds of breaks the assumption of IsSquashableConst() where we want
only constants to be allowed because EXECUTE parameters can be any
kind of Expr nodes. At least that's the intention of the code on
HEAD.Now, I am not actually objecting about PARAM_EXTERN included or not if
there's a consensus behind it and my arguments are considered as not
relevant. The patch is written so as it claims that a PARAM_EXTERN
implies the expression to be a Const, but it may not be so depending
on what the execution path is given for the parameter. Or at least
the patch could be clearer and rename the parts about the "Const"
squashable APIs around queryjumblefuncs.c.[...]
The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
btw:
/messages/by-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.comIn fact, this has been discussed much earlier in the thread above, as
essentially the same implementation with T_Params, which is submitted
here, was part of the original patch. The concern was always whether or
not it will break any assumption about query identification, because
this way much broader scope of expressions will be considered equivalent
for query id computation purposes.At the same time after thinking about this concern more, I presume it
already happens at a smaller scale -- when two queries happen to have
the same number of parameters, they will be indistinguishable even if
parameters are different in some way.Returning to the topic of whether to squash list of Params.
Originally squashing of Params wasn't included into the squashing patch due to
concerns from reviewers about treating quite different queries as the same for
the purposes of query identification. E.g. there is some assumption somewhere,
which will be broken if we treat query with a list of integer parameters same
as a query with a list of float parameters. For the sake of making progress
I've decided to postpone answering this question and concentrate on more simple
scenario. Now, as the patch was applied, I think it's a good moment to reflect
on those concerns. It's not enough to say that we don't see any problems with
squashing of Param, some more sound argumentation is needed. So, what will
happen if parameters are squashed as constants?1. One obvious impact is that more queries, that were considered distinct
before, will have the same normalized query and hence the entry in
pg_stat_statements. Since a Param could be pretty much anything, this can lead
to a situation when two queries with quiet different performance profiles (e.g.
one contains a parameter value, which is a heavy function, another one doesn't)
are matched to one entry, making it less useful.But at the same time this already can happen if those two queries have the same
number of parameters, since query parametrizing is intrinsically lossy in this
sense. The only thing we do by squashing such queries is we loose information
about the number of parameters, not properties of the parameters themselves.2. Another tricky scenario is when queryId is used by some extension, which in
turn makes assumption about it that are going to be rendered incorrect by
squashing. The only type of assumptions I can imagine falling into this
category is anything about equivalence of queries. For example, an extension
can capture two queries, which have the same normalized entry in pgss, and
assume all properties of those queries are the same.It's worth noting that normalized query is not transitive, i.e. if a query1 has
the normalized version query_norm, and a query2 has the same normalized version
query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
they could have list of parameter values with different type and the same
size). That means that such assumptions are already faulty, and could work most
of the time only because it takes queries with a list of the same size to break
the assumption. Squashing such queries will make them wrong more often.One can argue that we might want to be friendly to such extensions, and do not
"break" them even further. But I don't think it's worth it, as number of such
extensions is most likely low, if any. One more extreme case would be when an
extension assumes that queries with the same entry in pgss have the same number
of parameters, but I don't see how such assumption could be useful.3. More annoying is the consequence that parameters are going to be treated as
constants in pg_stat_statements. While mostly harmless, that would mean they're
going to be replaced in the same way as constants. This means that the
parameter order is going to be lost, e.g.:SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
AND id IN ($2 /*, ... */)This representation could be confusing of course. It could be either explained
in the documentation, or LocationLen has to be extended to carry information
about whether it's a constant or a parameter, and do not replace the latter. In
any case, anything more than the first parameter number will be lost, but it's
probably not so dramatic.At the end of the day, I think the value of squashing for parameters outweighs
the problems described above. As long as there is an agreement about that, it's
fine by me. I've attached the more complete version of the patch (but without
modifying LocationLen to not replace Param yet) in case if such agreemeng will
be achieved.
Would it make sense to rename `RecordConstLocation` to something like
`RecordExpressionLocation` instead?
- /* Array of locations of constants that should be removed */
+ /* Array of locations of constants that should be removed and parameters */
LocationLen *clocations;
should be
+ /* Array of locations of constants and parameters that should be removed */
You could also consider renaming `clocations` to `elocations`, this
may introduce
some additional churn though.
--
Regards
Junwang Zhao
On Tue, May 06, 2025 at 11:50:07PM GMT, Junwang Zhao wrote:
Would it make sense to rename `RecordConstLocation` to something like
`RecordExpressionLocation` instead?
Yeah, naming is hard. RecordExpressionLocation is somehow more vague,
but I see what you mean, maybe something along these lines would be
indeed a better fit.
- /* Array of locations of constants that should be removed */ + /* Array of locations of constants that should be removed and parameters */ LocationLen *clocations;should be
+ /* Array of locations of constants and parameters that should be removed */
That was clumsy but intentional, because contrary to constants
parameters do not need to be removed. I guess I have to change the
wording a bit to make it clear.
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.
As I was looking further into what we currently have in v18 and HEAD
the normalization could break if we pass a function.
For example,
"""
select where 1 in (1, 2, int4(1));
"""
the normalized string is,
"""
select where $1 in ($2 /*, ... */))
"""
Notice the extra close parenthesis that is added after the comment. This is
because although int4(1) is a function call it is rewritten as a Const
and that breaks the assumptions being made by the location of the
last expression.
Also, something like:
"""
select where 1 in (1, 2, cast(4 as int));
"""
is normalized as:
"""
select where $1 in ($2 /*, ... */ as int))
"""
I don't think the current state is acceptable, if it results in pg_s_s
storing an invalid normalized version of the sql.
Now, with the attached v2 supporting external params, we see other normalization
anomalies such as
"""
postgres=# select where $1 in ($3, $2) and 1 in ($4, cast($5 as int))
\bind 0 1 2 3 4
postgres-# ;
--
(0 rows)
postgres=# select toplevel, query, calls from pg_stat_statements;
toplevel | query
| calls
----------+-------------------------------------------------------------------------+-------
t | select where $1 in ($2 /*, ... */) and $3 in ($4 /*, ...
*/($5 as int)) | 1
(1 row)
"""
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
thoughts?
--
Sami Imseih
Amazon Web Services (AWS)
On Tue, May 06, 2025 at 01:32:48PM GMT, Sami Imseih wrote:
I also agree with Alvaro that this discussion doesn't justify a
revert. If the pre-v18 behavior wasn't chiseled on stone tablets,
the new behavior isn't either. We can improve it some more later.As I was looking further into what we currently have in v18 and HEAD
the normalization could break if we pass a function.[...]
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.
Maybe it would be possible to address thins in not too complicated way
in fill_in_constant_lengths, since it already operates with parsed
tokens.
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.
If we are picking up the start and end points from gram.c and we add these
positions to A_Expr or A_ArrayExpr and then make them available to ArrayExpr,
then we know the exact boundary of the IN list. Even if a function
call is simplified down
to a constant, it will not really matter because we are going to normalize
between the original opening and closing parentheses of the IN list.
(Actually, we can even track the actual textual starting and end point of a List
as well)
Attached ( not in patch form ) is the idea for this.
```
postgres=# select where 1 in (1, int4(1));
--
(1 row)
postgres=# select where 1 in (1, int4($1::int)) \bind 1
postgres-# ;
--
(1 row)
postgres=# select toplevel, query, calls from pg_stat_statements;
toplevel | query | calls
----------+------------------------------------+-------
t | select where $1 in ($2 /*, ... */) | 2
(1 row)
```
What do you think?
--
Sami Imseih
Attachments:
Sami-WIP-Allow-query-jumble-to-squash-a-list-external-para.txttext/plain; charset=US-ASCII; name=Sami-WIP-Allow-query-jumble-to-squash-a-list-external-para.txtDownload+139-129
On Tue, May 06, 2025 at 01:32:48PM -0500, Sami Imseih wrote:
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.
FWIW, this is exactly the kind of issues we have spent time on when
improving the location detection of sub-queries for some DDL patterns,
and the parser is the only path I am aware of where this can be done
sanely because the extra characters that may, or may not, be included
in some of the expressions would be naturally discarded based on the @
locations we retrieve.
For reference, this story is part of 499edb09741b, more precisely
around this part of the thread:
/messages/by-id/CACJufxF9hqyfmKEdpiG=PbrGdKVNP2BQjHFJh4q6639sV7NmvQ@mail.gmail.com
(FWIW, I've seen assumptions around the detection of specific
locations done outside the parser in pg_hint_plan as well, that did
not finish well because the code makes assumptions that natural
parsers are just better at because they're designed to detect such
cases.)
--
Michael
On Tue, May 06, 2025 at 03:01:32PM GMT, Sami Imseih wrote:
Without properly accounting for the boundaries of the list of expressions, i.e.,
the start and end positions of '(' and ')' or '[' and ']' and normalizing the
expressions in between, it will be very difficult for the normalization to
behave sanely.I don't think having the end location in this case would help -- when it
comes to ParseFuncOrColumn, looks like for coerce functions it just
replaces the original FuncCall with the argument expression. Meaning
that when jumbling we have only the coerce argument expression (Const),
which ends before the closing brace, not the parent expression.If we are picking up the start and end points from gram.c and we add these
positions to A_Expr or A_ArrayExpr and then make them available to ArrayExpr,
then we know the exact boundary of the IN list. Even if a function
call is simplified down
to a constant, it will not really matter because we are going to normalize
between the original opening and closing parentheses of the IN list.
What do you think?
Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.
On Wed, May 07, 2025 at 10:41:22AM +0200, Dmitry Dolgov wrote:
Ah, I see what you mean. I think the idea is fine, it will simplify
certain things as well as address the issue. But I'm afraid adding
start/end location to A_Expr is a bit too invasive, as it's being used
for many other purposes. How about introducing a new expression for this
purposes, and use it only in in_expr/array_expr, and wrap the
corresponding expressions into it? This way the change could be applied
in a more targeted fashion.
Yes, that feels invasive. The use of two static variables to track
the start and the end positions in an expression list can also be a
bit unstable to rely on, I think. It seems to me that this part
could be handled in a new Node that takes care of tracking the two
positions, instead, be it a start/end couple or a location/length
couple? That seems necessary to have when going through
jumbleElements().
--
Michael