[[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

Started by Sami Imseih3 months ago11 messageshackers
Jump to latest
#1Sami Imseih
samimseih@gmail.com

Hi

I was made aware of a crash in pg_stat_statements when processing
IN clauses that contain both constants and variable expressions.

The original case was more complex, but I constructed a minimal repro case:

```
CREATE TABLE test_squash (id int, data int);
SELECT * FROM test_squash a, test_squash b WHERE a.id IN (1, 2, 3,
b.id, b.id + 1);
```

With asserts enabled:
```
TRAP: failed Assert("len_to_wrt >= 0"), File:
"../contrib/pg_stat_statements/pg_stat_statements.c", Line: 2906, PID:
299892
postgres: ubuntu regression_pg_stat_statements [local]
SELECT(ExceptionalCondition+0x74)[0x5bf8a012dc44]
```

The crash occurs during query normalization in pg_stat_statements and
started with
0f65f3eec4. The execution plan shows what's happening:

```
postgres=# EXPLAIN SELECT * FROM test_squash a, test_squash b WHERE
a.id IN (1, 2, 3, b.id, b.id + 1);
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..121376.35 rows=126798 width=16)
Join Filter: ((a.id = ANY ('{1,2,3}'::integer[])) OR (a.id = b.id)
OR (a.id = (b.id + 1)))
-> Seq Scan on test_squash a (cost=0.00..32.60 rows=2260 width=8)
-> Materialize (cost=0.00..43.90 rows=2260 width=8)
-> Seq Scan on test_squash b (cost=0.00..32.60 rows=2260 width=8)
(5 rows)
```

When IN clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.

This transformation can create overlapping token locations that cause
pg_stat_statements query normalization to crash.

To fix, disable squashing for mixed IN expressions by detecting when
both variables and constants are present during parse transformation,
and setting the array's list_start/list_end to -1 to prevent squashing.

I also thought about fixing this in `generate_normalized_query` where
instead of the assert:

```
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```

do something like this

```
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
if (len_to_wrt < 0)
continue;
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```
to find an overlapping location. But this would normalize the statement to:

```
SELECT * FROM test_squash a, test_squash b WHERE a.id IN ($1 /*, ... */);
```

Which is not correct as it will squash the variables and potentially
introduce some
silent regressions in normalizations.

So, It is better to skip squashing altogether in this case. Attached is a patch.

If there is agreement to this fix, I think it should be backpacthed to
all versions that
support squashing of IN-lists.

Thoughts?

--
Sami Imseih
Amazon Web Services (AWS)

Attachments:

v1-0001-pg_stat_statements-Fix-crash-in-with-mixed-IN-cla.patchapplication/octet-stream; name=v1-0001-pg_stat_statements-Fix-crash-in-with-mixed-IN-cla.patchDownload+31-4
#2Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#1)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Mon, Jan 12, 2026 at 10:29:51PM -0600, Sami Imseih wrote:

With asserts enabled:
```
TRAP: failed Assert("len_to_wrt >= 0"), File:
"../contrib/pg_stat_statements/pg_stat_statements.c", Line: 2906, PID:
299892
postgres: ubuntu regression_pg_stat_statements [local]
SELECT(ExceptionalCondition+0x74)[0x5bf8a012dc44]
```

Oops.

If there is agreement to this fix, I think it should be backpacthed to
all versions that
support squashing of IN-lists.

It's pretty clear to me that this is a bug and that we are going to
backpatch something. I am running out of battery today, will look
again at that later except if somebody beats me to it.
--
Michael

#3zengman
zengman@halodbtech.com
In reply to: Michael Paquier (#2)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

If there is agreement to this fix, I think it should be backpacthed to
all versions that
support squashing of IN-lists.

Hi,

Thank you for providing the test case. I happened to have both versions 16 and 19 on hand — it stably triggers the issue on version 19, while version 16 works perfectly fine.
Additionally, I noticed that "ars" in the comment of your patch should be "Vars" (this seems to be a typo).

```
postgres@zxm-VMware-Virtual-Platform:~/code/postgres$ psql
psql (19devel)
Type "help" for help.

postgres=# SELECT * FROM test_squash a, test_squash b WHERE a.id IN (1, 2, 3,
b.id, b.id + 1);
TRAP: failed Assert("len_to_wrt >= 0"), File: "pg_stat_statements.c", Line: 2906, PID: 52564
postgres: postgres postgres [local] SELECT(ExceptionalCondition+0xbb)[0x59d71bc1d631]
```

```
postgres@zxm-VMware-Virtual-Platform:~$ psql
psql (16.11)
Type "help" for help.

postgres=# SELECT * FROM test_squash a, test_squash b WHERE a.id IN (1, 2, 3,
b.id, b.id + 1);
id | data | id | data
----+------+----+------
(0 rows)

postgres=# SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls
------------------------------------------------------------------------+-------
SELECT * FROM test_squash a, test_squash b WHERE a.id IN ($1, $2, $3, +| 2
b.id, b.id + $4) |
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 1
select pg_backend_pid() | 1
(3 rows)
```

--
Regards,
Man Zeng
www.openhalo.org

#4Sami Imseih
samimseih@gmail.com
In reply to: zengman (#3)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

Hi,

Thanks for looking!

Additionally, I noticed that "ars" in the comment of your patch should be "Vars"
(this seems to be a typo).

oops. Thanks for spotting that. I was not too happy with the original comment
altogether, and changed it to something more concise.

--
Sami Imseih
Amazon Web Services (AWS)

Attachments:

v2-0001-pg_stat_statements-Fix-nested-tracking-for-implic.patchapplication/octet-stream; name=v2-0001-pg_stat_statements-Fix-nested-tracking-for-implic.patchDownload+31-4
#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#1)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Mon, Jan 12, 2026 at 10:29:51PM -0600, Sami Imseih wrote:
So, It is better to skip squashing altogether in this case. Attached is a patch.

If there is agreement to this fix, I think it should be backpacthed to
all versions that
support squashing of IN-lists.

Thoughts?

I think there are few important points:

* From what I understand the transformation that's causing this is
exactly the one from transformAExprIn, and from that perspective
refusing to squash if Vars are present is a reasonable approach (if
they won't be splitted off, such Array will not pass
IsSquashableConstantList anyway).

* I think one subtle thing, which we're doing wrong in transformAExprIn
is splitting elements into rvars and rnonvars, but still using end
location of the whole expression a->rexpr_list_end as the end location
of the new array. This, together with an attempt to record a constant
form the same range is ultimately causing the problem. Probably it
would be a good idea to adjust newa->list_end with this in mind.

* Independently from that, it sounds like a good idea to have protection
from overlapping constants when generating normalized query. It could
be done in the same way as the previous bug was fixed. In
fill_in_constant_lengths we currently check for duplicated constants:

locs[i].location == locs[i - 1].location

then set length = -1 for those. I think it's worth extending it to
check for overlapping with the previous constant, something like:

(locs[i].location == locs[i - 1].location ||
locs[i].location <= locs[i - 1].location + locs[i - 1].length)

#6Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#5)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

Thanks for the comments!

* From what I understand the transformation that's causing this is
exactly the one from transformAExprIn, and from that perspective
refusing to squash if Vars are present is a reasonable approach (if
they won't be splitted off, such Array will not pass
IsSquashableConstantList anyway).

Correct. If a list had Vars exclusively, it will not be eligible for squashing.

* I think one subtle thing, which we're doing wrong in transformAExprIn
is splitting elements into rvars and rnonvars, but still using end
location of the whole expression a->rexpr_list_end as the end location
of the new array. This, together with an attempt to record a constant
form the same range is ultimately causing the problem. Probably it
would be a good idea to adjust newa->list_end with this in mind.

That crossed my mind, but I am not sure how we can actually do this
since we can have Vars and nonVars in different orders in the list.

```
SELECT * FROM test_squash a, test_squash b WHERE a.id IN (b.id + 1, 1,
b.id + 1, 3,
b.id, b.id + 1);
```

* Independently from that, it sounds like a good idea to have protection
from overlapping constants when generating normalized query. It could
be done in the same way as the previous bug was fixed. In
fill_in_constant_lengths we currently check for duplicated constants:

locs[i].location == locs[i - 1].location

then set length = -1 for those. I think it's worth extending it to
check for overlapping with the previous constant, something like:

(locs[i].location == locs[i - 1].location ||
locs[i].location <= locs[i - 1].location + locs[i - 1].length)

Yeah, this may be a good defensive check to add, but it's
not going to be useful for this issue.

--
Sami Imseih
Amazon Web Services (AWS)

#7Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Sami Imseih (#6)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Wed, Jan 14, 2026 at 04:13:48PM -0600, Sami Imseih wrote:
That crossed my mind, but I am not sure how we can actually do this
since we can have Vars and nonVars in different orders in the list.

It has less to do with the order, the problem is that we have the
expression end location only for arrays. As far as I recall it was an
intentional design decision, and to adjust newa->list_end in such
scenarios we need to solve the original problem of finding an end
location of any arbitrary expression, which turned out to be
complicated. That's why I mention that tracking vars is more practical.

Nevertheless it makes sense to comment that in the code, just in case if
we're out of luck, and similar situation will appear somewhere else.

```
SELECT * FROM test_squash a, test_squash b WHERE a.id IN (b.id + 1, 1,
b.id + 1, 3,
b.id, b.id + 1);
```

* Independently from that, it sounds like a good idea to have protection
from overlapping constants when generating normalized query. It could
be done in the same way as the previous bug was fixed. In
fill_in_constant_lengths we currently check for duplicated constants:

locs[i].location == locs[i - 1].location

then set length = -1 for those. I think it's worth extending it to
check for overlapping with the previous constant, something like:

(locs[i].location == locs[i - 1].location ||
locs[i].location <= locs[i - 1].location + locs[i - 1].length)

Yeah, this may be a good defensive check to add, but it's
not going to be useful for this issue.

It is, the proposed change helps to avoid the crash pointed out in this
thread. It might not be the right solution for the reasons discussed so
far, but as a general defensive mechanism makes total sense to me (maybe
with an assert to highlight any potential future problem).

#8Sami Imseih
samimseih@gmail.com
In reply to: Dmitry Dolgov (#7)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

As far as I recall it was an
intentional design decision, and to adjust newa->list_end in such
scenarios we need to solve the original problem of finding an end
location of any arbitrary expression,

I'm not sure I follow. How would we adjust newa->list_end?

* Independently from that, it sounds like a good idea to have protection
from overlapping constants when generating normalized query. It could
be done in the same way as the previous bug was fixed. In
fill_in_constant_lengths we currently check for duplicated constants:

locs[i].location == locs[i - 1].location

then set length = -1 for those. I think it's worth extending it to
check for overlapping with the previous constant, something like:

(locs[i].location == locs[i - 1].location ||
locs[i].location <= locs[i - 1].location + locs[i - 1].length)

Yeah, this may be a good defensive check to add, but it's
not going to be useful for this issue.

It is, the proposed change helps to avoid the crash pointed out in this
thread. It might not be the right solution for the reasons discussed so
far, but as a general defensive mechanism makes total sense to me (maybe
with an assert to highlight any potential future problem).

Yeah, we should not assume this is not expected behavior. Here is one
from the regress tests in which overlaps can occur:
```
CREATE TABLE shighway (
surface text
) INHERITS (road);

INSERT INTO shighway
SELECT a.*, 'asphalt'
FROM road a, road b
WHERE a.name ~ 'State Hwy.*';
```

--
Sami Imseih
Amazon Web Services (AWS)

#9Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#8)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Thu, Jan 15, 2026 at 02:53:20PM -0600, Sami Imseih wrote:

As far as I recall it was an
intentional design decision, and to adjust newa->list_end in such
scenarios we need to solve the original problem of finding an end
location of any arbitrary expression,

I'm not sure I follow. How would we adjust newa->list_end?

I may be missing something, of course, but Dmitry's message reads as
follows to me:
"It was a design choice to track the start and end locations because
we needed that for arrays, and resetting the locations if we have Vars
in the list like you are suggesting is OK by me."

I am going to re-read a bit more the area and think about a couple of
patterns to test, but I think that your suggested solution should be
OK to disable the list squashing as you are suggesting in this case:
there is nothing we can really do, and IMO that would still be OK for
most users because squashing has been designed for non-Var lists as
far as I know. This is a rare pattern for long lists in IN arrays, as
well. I'll sleep on it for now.
--
Michael

#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#9)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Mon, Jan 19, 2026 at 05:14:46PM +0900, Michael Paquier wrote:
On Thu, Jan 15, 2026 at 02:53:20PM -0600, Sami Imseih wrote:

As far as I recall it was an
intentional design decision, and to adjust newa->list_end in such
scenarios we need to solve the original problem of finding an end
location of any arbitrary expression,

I'm not sure I follow. How would we adjust newa->list_end?

I may be missing something, of course, but Dmitry's message reads as
follows to me:
"It was a design choice to track the start and end locations because
we needed that for arrays, and resetting the locations if we have Vars
in the list like you are suggesting is OK by me."

Yep, I concur the current approach is fine. What I was saying about
adjusting newa->list_end is just an ideal and at the moment only
hypothetical scenario, where we could deduce end location of the new
list based on its nested elements. Currently there are no mechanism to
achieve that, so maybe in the future.

#11Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#10)
Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause

On Mon, Jan 19, 2026 at 10:24:41AM +0100, Dmitry Dolgov wrote:

Yep, I concur the current approach is fine. What I was saying about
adjusting newa->list_end is just an ideal and at the moment only
hypothetical scenario, where we could deduce end location of the new
list based on its nested elements. Currently there are no mechanism to
achieve that, so maybe in the future.

Okay, applied that down to v18 after a second lookup and a tweak to
the proposed comment.
--
Michael