Bug in pg_stat_statements
Hi hackers.
In PG18 gthere is bug either in pg_stat_statements.c, either in
queryjumblefuncs.c.
Repro (you need Postgres build with asserts and pg_stat_statements
included in shared_preload_librarties:
create function f(a integer[], out x integer, out y integer) returns
record as $$ begin
x = a[1];
y = a[2];
end;
$$ language plpgsql;
create table t(x integer);
select ((f(array[1,2]))).* from t;
The problems is caused by ((...)).* which cause to include the same
array literal twice in JumbleQuery.
But pg_stat_stataement assumes that any occurrence of constant is unique
and it cause assertion failure here:
/* Copy next chunk (what precedes the next constant) */
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
So we should either exclude duplicates in RecordConstLocation
(queryjumblefuncs.c)
either in generate_normalized_query in pg_stat_statements.c
What is considered to be more correct?
In PG18 gthere is bug either in pg_stat_statements.c, either in
queryjumblefuncs.c.Repro (you need Postgres build with asserts and pg_stat_statements
included in shared_preload_librarties:create function f(a integer[], out x integer, out y integer) returns
record as $$ begin
x = a[1];
y = a[2];
end;
$$ language plpgsql;create table t(x integer);
select ((f(array[1,2]))).* from t;The problems is caused by ((...)).* which cause to include the same
array literal twice in JumbleQuery.
But pg_stat_stataement assumes that any occurrence of constant is unique
and it cause assertion failure here:/* Copy next chunk (what precedes the next constant) */
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);So we should either exclude duplicates in RecordConstLocation
(queryjumblefuncs.c)
You are correct in the analysis that this is because we are
recording the same constant multiple times, this causes the
len_to_write to be 0 on the second occurance of the duplicated
location. I confirmed that this was introduced in the constant list
squashing per 0f65f3eec.
The same location being recorded twice will occur because of
row expansion, therefore the same location will be jumbled twice,
as observed in the explain output.
```
postgres=# explain verbose select ((f(array[1,2]))).* from t;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on public.t (cost=0.00..1310.50 rows=2550 width=8)
Output: (f('{1,2}'::integer[])).x, (f('{1,2}'::integer[])).y
Query Identifier: 5305868219688930530
(3 rows)
```
Before 0f65f3eec, this was not an issue, because we recorded
all the locations.
I believe the correct answer is to fix this in queryjumblefuncs.c
and more specifically inside _jumbleElements. We should not
record the same location twice. We can do this by tracking the
start location of the last recorded constant location, and we
can skip recording it if we see it again. See the attached
which also includes test cases.
Also, inside the loop of generate_normalized_query in
pg_stat_statements we are asserting len_to_wrt >= 0, so we
are OK with memcpy of 0 bytes. This is valid, but does
not seem correct.
```
Assert(len_to_wrt > 0);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```
Should we continue the loop and skip the memcpy
if (len_to_wrt == 0) instead?
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
0001-Fix-jumbling-of-squashed-lists-with-row-expansion.patchapplication/octet-stream; name=0001-Fix-jumbling-of-squashed-lists-with-row-expansion.patchDownload
From 667d50be65dd1f4307e2a46811695d4da92aa16c Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 00:34:54 +0000
Subject: [PATCH 1/1] Fix jumbling of squashed lists with row expansion
Commit 0f65f3eec introduced squashing of constant lists, but did
not handle row expansion of composite values correctly. As a
result, the same location could be recorded multiple times,
leading to assertion failures in pg_stat_statements during
generate_normalized_query.
This fix tracks the start position of the last recorded constant
and skips it if encountered again during _jumbleElements.
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
---
.../pg_stat_statements/expected/squashing.out | 43 +++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 18 ++++++++
src/backend/nodes/queryjumblefuncs.c | 18 ++++++--
src/include/nodes/queryjumble.h | 3 ++
4 files changed, 78 insertions(+), 4 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f952f47ef7b..af54ee39ba1 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -809,6 +809,47 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
select where $1 IN ($2 /*, ... */) | 2
(2 rows)
+-- composite functions with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+ x | y | ?column? | ?column? | ?column? | x | y | ?column? | ?column?
+---+---+----------+----------+----------+---+---+----------+----------
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------------------------------------------+-------
+ SELECT ((composite_f(array[$1 /*, ... */]))).* FROM test_composite | 2
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2, $3, $4, ((composite_f(array[$5 /*, ... */]))).*, $6, $7+| 1
+ FROM test_composite +|
+ WHERE x IN ($8 /*, ... */) |
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(3 rows)
+
--
-- cleanup
--
@@ -818,3 +859,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 53138d125a9..6fc9e0e56b2 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -291,6 +291,22 @@ select where '1' IN ('1'::int::text, '2'::int::text);
select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- composite functions with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- cleanup
--
@@ -300,3 +316,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 31f97151977..c8bad6f3c12 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -193,6 +193,7 @@ InitJumble(void)
jstate->highest_extern_param_id = 0;
jstate->pending_nulls = 0;
jstate->has_squashed_lists = false;
+ jstate->last_start_location = 0;
#ifdef USE_ASSERT_CHECKING
jstate->total_jumble_len = 0;
#endif
@@ -656,10 +657,19 @@ _jumbleElements(JumbleState *jstate, List *elements, Node *node)
if (aexpr->list_start > 0 && aexpr->list_end > 0)
{
- RecordConstLocation(jstate,
- false,
- aexpr->list_start + 1,
- (aexpr->list_end - aexpr->list_start) - 1);
+ /*
+ * There are cases where the same location could be reached by
+ * jumbling multiple times. In that case, we don't want to
+ * record it multiple times.
+ */
+ if (aexpr->list_start != jstate->last_start_location)
+ {
+ RecordConstLocation(jstate,
+ false,
+ aexpr->list_start + 1,
+ (aexpr->list_end - aexpr->list_start) - 1);
+ jstate->last_start_location = aexpr->list_start;
+ }
normalize_list = true;
jstate->has_squashed_lists = true;
}
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index dcb36dcb44f..06bd17984cb 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -64,6 +64,9 @@ typedef struct JumbleState
/* Whether squashable lists are present */
bool has_squashed_lists;
+ /* The last start location recorded */
+ int last_start_location;
+
/*
* Count of the number of NULL nodes seen since last appending a value.
* These are flushed out to the jumble buffer before subsequent appends
--
2.43.0
On Thu, Oct 23, 2025 at 07:49:55PM -0500, Sami Imseih wrote:
I believe the correct answer is to fix this in queryjumblefuncs.c
and more specifically inside _jumbleElements. We should not
record the same location twice. We can do this by tracking the
start location of the last recorded constant location, and we
can skip recording it if we see it again. See the attached
which also includes test cases.
I think there is another option. Before squashing implementation, duplicated
constants were dealt with in fill_in_constant_lengths -- after processing their
length was -1 and such constants were skipped further down the line. After
0f65f3eec, if such a constant is supposed to be squashed we now ignore it, but
set the length anyway in RecordConstLocation:
if (locs[i].squashed)
continue; /* squashable list, ignore */
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
This sort of short cut the verification for duplicated constants. Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengths just before skipping squashed constants and reset
the length if needed.
This sort of short cut the verification for duplicated constants. Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengths just before skipping squashed constants and reset
the length if needed.
Hi,
yes, I was not convinced my initial idea is good, and I ended up finding
another case that still breaks:
```
SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
```
Here we have multiple squashable lists, and simply checking that the
location we are trying to record != the last location does not work. Also,
checking that the location we are trying to record is ahead of the last
location also does not work because the locations of squashable
lists in the predicate will be recorded first. So, we can't rely on the
location we want to record being higher than previous locations.
For example:
```
SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).* FROM t WHERE x
IN (1, 2, 3);
```
Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_length
Yes, I started thinking along these lines as well, where we check for
duplicates
in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
have a squashable list, which is what I have attached with updated test cases.
This means we do need to scan the locations one extra time during jumbling,
but I don't see that as a problem. Maybe there is another better way?
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v2-0001-Fix-jumbling-of-squashed-lists-with-row-expansion.patchapplication/octet-stream; name=v2-0001-Fix-jumbling-of-squashed-lists-with-row-expansion.patchDownload
From 7c0ef8405df8154a939bf05290aeaf16a8c300c9 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 00:34:54 +0000
Subject: [PATCH v2 1/1] Fix jumbling of squashed lists with row expansion
Commit 0f65f3eec introduced squashing of constant lists, but did
not handle row expansion of composite values correctly. As a
result, the same location could be recorded multiple times,
leading to assertion failures in pg_stat_statements during
generate_normalized_query.
The fix is to de-duplicate the locations at the end of jumbling,
only if we have squashable lists.
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
---
.../pg_stat_statements/expected/squashing.out | 80 +++++++++++++++++++
contrib/pg_stat_statements/sql/squashing.sql | 26 ++++++
src/backend/nodes/queryjumblefuncs.c | 42 ++++++++++
3 files changed, 148 insertions(+)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f952f47ef7b..d5bb67c7222 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -809,6 +809,84 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
select where $1 IN ($2 /*, ... */) | 2
(2 rows)
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+ x | y | ?column? | ?column? | ?column? | x | y | ?column? | ?column?
+---+---+----------+----------+----------+---+---+----------+----------
+(0 rows)
+
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+ x | y | ?column?
+---+---+----------
+(0 rows)
+
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+ f1
+-------
+ {1,2}
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+ f1 | f2
+-------+---------
+ {1,2} | {1,2,3}
+(1 row)
+
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+ ?column? | ?column? | f1 | f2 | ?column? | ?column?
+----------+----------+-------+---------+----------+----------
+ 1 | 2 | {1,2} | {1,2,3} | 3 | 4
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+ f1 | f2 | ?column?
+-------+---------+----------
+ {1,2} | {1,1,3} | 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------------------------------------------+-------
+ SELECT $1, $2, (ROW(ARRAY[$3 /*, ... */], ARRAY[$4 /*, ... */])).*, $5, $6 | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).* FROM test_composite | 2
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2 FROM test_composite | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2, $3, $4, ((composite_f(array[$5 /*, ... */]))).*, $6, $7+| 1
+ FROM test_composite +|
+ WHERE x IN ($8 /*, ... */) |
+ SELECT (ROW(ARRAY[$1 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).*, $3 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(8 rows)
+
--
-- cleanup
--
@@ -818,3 +896,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 53138d125a9..03b0515f872 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -291,6 +291,30 @@ select where '1' IN ('1'::int::text, '2'::int::text);
select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- cleanup
--
@@ -300,3 +324,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 31f97151977..8aba59105da 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -68,6 +68,7 @@ static void FlushPendingNulls(JumbleState *jstate);
static void RecordConstLocation(JumbleState *jstate,
bool extern_param,
int location, int len);
+static void CleanupConstLocations(JumbleState *jstate);
static void _jumbleNode(JumbleState *jstate, Node *node);
static void _jumbleList(JumbleState *jstate, Node *node);
static void _jumbleElements(JumbleState *jstate, List *elements, Node *node);
@@ -217,7 +218,10 @@ DoJumble(JumbleState *jstate, Node *node)
/* Squashed list found, reset highest_extern_param_id */
if (jstate->has_squashed_lists)
+ {
jstate->highest_extern_param_id = 0;
+ CleanupConstLocations(jstate);
+ }
/* Process the jumble buffer and produce the hash value */
return DatumGetInt64(hash_any_extended(jstate->jumble,
@@ -423,6 +427,44 @@ RecordConstLocation(JumbleState *jstate, bool extern_param, int location, int le
}
}
+/*
+ * Squashed lists may record a location more than once, as is the
+ * case with row expansion of an expression that contains a squashable
+ * list. In that case, we remove duplicate locations at the end of
+ * jumbling.
+ */
+static void
+CleanupConstLocations(JumbleState *jstate)
+{
+ int i,
+ j,
+ k = 0;
+
+ Assert(jstate->has_squashed_lists);
+
+ if (jstate->clocations_count <= 1)
+ return; /* nothing to do */
+
+ for (i = 0; i < jstate->clocations_count; i++)
+ {
+ for (j = i + 1; j < jstate->clocations_count;)
+ {
+ if (jstate->clocations[i].location == jstate->clocations[j].location)
+ {
+ /* remove duplicate */
+ for (k = j; k < jstate->clocations_count - 1; k++)
+ jstate->clocations[k] = jstate->clocations[k + 1];
+
+ jstate->clocations_count--; /* resize the array */
+ }
+ else
+ j++;
+ }
+ }
+
+ /* XXX: we could resize the array here, but not strictly needed */
+}
+
/*
* Subroutine for _jumbleElements: Verify a few simple cases where we can
* deduce that the expression is a constant:
--
2.43.0
On Fri, Oct 24, 2025 at 09:17:22AM -0500, Sami Imseih wrote:
Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengthYes, I started thinking along these lines as well, where we check for
duplicates
in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
have a squashable list, which is what I have attached with updated test cases.This means we do need to scan the locations one extra time during jumbling,
but I don't see that as a problem. Maybe there is another better way?
Why? What I hand in mind is something like this, after a quick test it seems to
be able to address both the original case and the one you've discovered.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f2187167..f17a2b79 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -3008,11 +3008,17 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
- if (locs[i].squashed)
- continue; /* squashable list, ignore */
-
if (loc <= last_loc)
+ {
+ locs[i].length = -1;
continue; /* Duplicate constant, ignore */
+ }
+
+ if (locs[i].squashed)
+ {
+ last_loc = loc;
+ continue; /* squashable list, ignore */
+ }
/* Lex tokens until we find the desired constant */
for (;;)
Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengthYes, I started thinking along these lines as well, where we check for
duplicates
in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
have a squashable list, which is what I have attached with updated test cases.This means we do need to scan the locations one extra time during jumbling,
but I don't see that as a problem. Maybe there is another better way?Why? What I hand in mind is something like this, after a quick test it seems to
be able to address both the original case and the one you've discovered.
ahh, what you have works because clocations is sorted by location before
the loop starts in `fill_in_constant_lengths` , so comparing locations
makes sense in this case. I am OK with this.
```
/*
* Sort the records by location so that we can process them in order while
* scanning the query text.
*/
if (jstate->clocations_count > 1)
qsort(jstate->clocations, jstate->clocations_count,
sizeof(LocationLen), comp_location);
```
I was really hoping that the fix could be inside of query jumbling, as I think
pg_stat_statements or any consumers of query jumbling don't need to
care more about squashing ( or other internals of jumbling ).
So now I also wonder if we should also move:
```
static char *generate_normalized_query(JumbleState *jstate, const char *query,
int query_loc, int *query_len_p);
static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
int query_loc);
```
from pg_stat_statements.c to queryjumblefuncs.c?
--
Sami Imseih
Amazon Web Services (AWS)
Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengthYes, I started thinking along these lines as well, where we check for
duplicates
in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
have a squashable list, which is what I have attached with updated test cases.This means we do need to scan the locations one extra time during jumbling,
but I don't see that as a problem. Maybe there is another better way?Why? What I hand in mind is something like this, after a quick test it seems to
be able to address both the original case and the one you've discovered.ahh, what you have works because clocations is sorted by location before
the loop starts in `fill_in_constant_lengths` , so comparing locations
makes sense in this case. I am OK with this.
v3-0001 does what Dimiti suggests with some slight tweaks:
1/ Moving "last_loc = loc;" earlier in the loop
2/ removing a comment for fill_in_constant_lengths that was
an oversight from 0f65f3eec, as we no longer assume that
the constant location is -1 when entering that routine. Also
added the test cases that were discovered, besides the
one from the original report.
I was really hoping that the fix could be inside of query jumbling, as I think
pg_stat_statements or any consumers of query jumbling don't need to
care more about squashing ( or other internals of jumbling ).
So now I also wonder if we should also move:```
static char *generate_normalized_query(JumbleState *jstate, const char *query,
int query_loc, int *query_len_p);static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
int query_loc);
```from pg_stat_statements.c to queryjumblefuncs.c?
v3-0002 moved both generate_normalized_query and fill_in_constant_lengths
to queryjumblefuncs.c, as these routines deal with the internal of jumbling
and should not be a concern of pg_stat_statements. I think this is a good
cleanup, but others may have different opinions on this.
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v3-0001-pg_stat_statements-Fix-duplicate-constant-locatio.patchapplication/octet-stream; name=v3-0001-pg_stat_statements-Fix-duplicate-constant-locatio.patchDownload
From 8eddb0684f57c2b3e9cd818766e4af384e55d668 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 19:12:01 +0000
Subject: [PATCH v3 1/2] pg_stat_statements: Fix duplicate constant locations
caused by composite row expansion
Fix duplicate constant locations caused by composite row expansion
Commit 0f65f3eec introduced squashing of constant lists, but failed to
handle row expansion of composite values correctly. This could result
in the same location being recorded multiple times, leading to
assertion failures in pg_stat_statements during
generate_normalized_query(), when attempting to normalize a duplicated
constant that does not actually exist in the query text.
This patch updates fill_in_constant_lengths() to check for squashed
constants only after skipping duplicates, and correctly updates
last_loc. Each location is now processed once while squashable
constants are still ignored.
An oversight in 0f65f3eec also meant duplicate locations were not
correctly set to -1. Prior to 0f65f3eec, RecordConstLocation always
set the length to -1 and delegated fill_in_constant_lengths() to
populate the lengths. Commit 0f65f3eec changed thi
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
---
.../pg_stat_statements/expected/squashing.out | 80 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 18 +++--
contrib/pg_stat_statements/sql/squashing.sql | 26 ++++++
3 files changed, 117 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f952f47ef7b..d5bb67c7222 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -809,6 +809,84 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
select where $1 IN ($2 /*, ... */) | 2
(2 rows)
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+ x | y | ?column? | ?column? | ?column? | x | y | ?column? | ?column?
+---+---+----------+----------+----------+---+---+----------+----------
+(0 rows)
+
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+ x | y | ?column?
+---+---+----------
+(0 rows)
+
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+ f1
+-------
+ {1,2}
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+ f1 | f2
+-------+---------
+ {1,2} | {1,2,3}
+(1 row)
+
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+ ?column? | ?column? | f1 | f2 | ?column? | ?column?
+----------+----------+-------+---------+----------+----------
+ 1 | 2 | {1,2} | {1,2,3} | 3 | 4
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+ f1 | f2 | ?column?
+-------+---------+----------
+ {1,2} | {1,1,3} | 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------------------------------------------+-------
+ SELECT $1, $2, (ROW(ARRAY[$3 /*, ... */], ARRAY[$4 /*, ... */])).*, $5, $6 | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).* FROM test_composite | 2
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2 FROM test_composite | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2, $3, $4, ((composite_f(array[$5 /*, ... */]))).*, $6, $7+| 1
+ FROM test_composite +|
+ WHERE x IN ($8 /*, ... */) |
+ SELECT (ROW(ARRAY[$1 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).*, $3 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(8 rows)
+
--
-- cleanup
--
@@ -818,3 +896,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f2187167c5c..6c9e49b471b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2955,8 +2955,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* a problem.
*
* Duplicate constant pointers are possible, and will have their lengths
- * marked as '-1', so that they are later ignored. (Actually, we assume the
- * lengths were initialized as -1 to start with, and don't change them here.)
+ * marked as '-1', so that they are later ignored.
*
* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, so we need to translate the provided locations
@@ -3002,17 +3001,24 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
{
int loc = locs[i].location;
int tok;
+ bool squashed = locs[i].squashed;
/* Adjust recorded location if we're dealing with partial string */
loc -= query_loc;
Assert(loc >= 0);
- if (locs[i].squashed)
- continue; /* squashable list, ignore */
-
if (loc <= last_loc)
+ {
+ locs[i].length = -1;
continue; /* Duplicate constant, ignore */
+ }
+
+ /* We have the next valid location, save the current location */
+ last_loc = loc;
+
+ if (squashed)
+ continue; /* squashable list, ignore */
/* Lex tokens until we find the desired constant */
for (;;)
@@ -3060,8 +3066,6 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
/* If we hit end-of-string, give up, leaving remaining lengths -1 */
if (tok == 0)
break;
-
- last_loc = loc;
}
scanner_finish(yyscanner);
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 53138d125a9..03b0515f872 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -291,6 +291,30 @@ select where '1' IN ('1'::int::text, '2'::int::text);
select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- cleanup
--
@@ -300,3 +324,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
--
2.43.0
v3-0002-pg_stat_statements-move-out-jumble-specific-routi.patchapplication/octet-stream; name=v3-0002-pg_stat_statements-move-out-jumble-specific-routi.patchDownload
From 110bf28c56218846d8a75180e868f97608b0cf80 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 20:42:30 +0000
Subject: [PATCH v3 2/2] pg_stat_statements: move out jumble-specific routines
During discussion of commit 8eddb06, it was noted that the
fill_in_constant_lengths and generate_normalized_query routines
depend on intricate jumbling knowledge. Implementing this code
within pg_stat_statements does not make sense; it belongs in the
core jumbling module, queryjumblefuncs.c. pg_stat_statements can
then consume these routines without needing to know the details.
This separation became especially apparent with the squashing of
constant lists, but it has been a general issue all along.
In addition to reducing the pg_stat_statements codebase, this
change makes these routines available to other extensions
performing similar operations.
---
.../pg_stat_statements/pg_stat_statements.c | 274 +-----------------
src/backend/nodes/queryjumblefuncs.c | 258 +++++++++++++++++
src/include/nodes/queryjumble.h | 4 +
3 files changed, 269 insertions(+), 267 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6c9e49b471b..95293d3b594 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,7 +50,6 @@
#include "access/htup_details.h"
#include "access/parallel.h"
#include "catalog/pg_authid.h"
-#include "common/int.h"
#include "executor/instrument.h"
#include "funcapi.h"
#include "jit/jit.h"
@@ -59,7 +58,6 @@
#include "nodes/queryjumble.h"
#include "optimizer/planner.h"
#include "parser/analyze.h"
-#include "parser/scanner.h"
#include "pgstat.h"
#include "storage/fd.h"
#include "storage/ipc.h"
@@ -377,12 +375,6 @@ static char *qtext_fetch(Size query_offset, int query_len,
static bool need_gc_qtexts(void);
static void gc_qtexts(void);
static TimestampTz entry_reset(Oid userid, Oid dbid, int64 queryid, bool minmax_only);
-static char *generate_normalized_query(JumbleState *jstate, const char *query,
- int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
- int query_loc);
-static int comp_location(const void *a, const void *b);
-
/*
* Module load callback
@@ -1359,6 +1351,13 @@ pgss_store(const char *query, int64 queryId,
if (jstate)
{
LWLockRelease(pgss->lock);
+
+ /*
+ * generate the normalized query. Note that the normalized
+ * representation may well vary depending on just which
+ * "equivalent" query is used to create the hashtable entry. We
+ * assume this is OK.
+ */
norm_query = generate_normalized_query(jstate, query,
query_location,
&query_len);
@@ -2823,262 +2822,3 @@ release_lock:
return stats_reset;
}
-
-/*
- * Generate a normalized version of the query string that will be used to
- * represent all similar queries.
- *
- * Note that the normalized representation may well vary depending on
- * just which "equivalent" query is used to create the hashtable entry.
- * We assume this is OK.
- *
- * If query_loc > 0, then "query" has been advanced by that much compared to
- * the original string start, so we need to translate the provided locations
- * to compensate. (This lets us avoid re-scanning statements before the one
- * of interest, so it's worth doing.)
- *
- * *query_len_p contains the input string length, and is updated with
- * the result string length on exit. The resulting string might be longer
- * or shorter depending on what happens with replacement of constants.
- *
- * Returns a palloc'd string.
- */
-static char *
-generate_normalized_query(JumbleState *jstate, const char *query,
- int query_loc, int *query_len_p)
-{
- char *norm_query;
- int query_len = *query_len_p;
- int norm_query_buflen, /* Space allowed for norm_query */
- len_to_wrt, /* Length (in bytes) to write */
- quer_loc = 0, /* Source query byte location */
- n_quer_loc = 0, /* Normalized query byte location */
- last_off = 0, /* Offset from start for previous tok */
- last_tok_len = 0; /* Length (in bytes) of that tok */
- int num_constants_replaced = 0;
-
- /*
- * Get constants' lengths (core system only gives us locations). Note
- * this also ensures the items are sorted by location.
- */
- fill_in_constant_lengths(jstate, query, query_loc);
-
- /*
- * Allow for $n symbols to be longer than the constants they replace.
- * Constants must take at least one byte in text form, while a $n symbol
- * certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
- * could refine that limit based on the max value of n for the current
- * query, but it hardly seems worth any extra effort to do so.
- */
- norm_query_buflen = query_len + jstate->clocations_count * 10;
-
- /* Allocate result buffer */
- norm_query = palloc(norm_query_buflen + 1);
-
- for (int i = 0; i < jstate->clocations_count; i++)
- {
- int off, /* Offset from start for cur tok */
- tok_len; /* Length (in bytes) of that tok */
-
- /*
- * If we have an external param at this location, but no lists are
- * being squashed across the query, then we skip here; this will make
- * us print the characters found in the original query that represent
- * the parameter in the next iteration (or after the loop is done),
- * which is a bit odd but seems to work okay in most cases.
- */
- if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
- continue;
-
- off = jstate->clocations[i].location;
-
- /* Adjust recorded location if we're dealing with partial string */
- off -= query_loc;
-
- tok_len = jstate->clocations[i].length;
-
- if (tok_len < 0)
- continue; /* ignore any duplicates */
-
- /* Copy next chunk (what precedes the next constant) */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /*
- * And insert a param symbol in place of the constant token; and, if
- * we have a squashable list, insert a placeholder comment starting
- * from the list's second value.
- */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
- num_constants_replaced + 1 + jstate->highest_extern_param_id,
- jstate->clocations[i].squashed ? " /*, ... */" : "");
- num_constants_replaced++;
-
- /* move forward */
- quer_loc = off + tok_len;
- last_off = off;
- last_tok_len = tok_len;
- }
-
- /*
- * We've copied up until the last ignorable constant. Copy over the
- * remaining bytes of the original query string.
- */
- len_to_wrt = query_len - quer_loc;
-
- Assert(len_to_wrt >= 0);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- Assert(n_quer_loc <= norm_query_buflen);
- norm_query[n_quer_loc] = '\0';
-
- *query_len_p = n_quer_loc;
- return norm_query;
-}
-
-/*
- * Given a valid SQL string and an array of constant-location records,
- * fill in the textual lengths of those constants.
- *
- * The constants may use any allowed constant syntax, such as float literals,
- * bit-strings, single-quoted strings and dollar-quoted strings. This is
- * accomplished by using the public API for the core scanner.
- *
- * It is the caller's job to ensure that the string is a valid SQL statement
- * with constants at the indicated locations. Since in practice the string
- * has already been parsed, and the locations that the caller provides will
- * have originated from within the authoritative parser, this should not be
- * a problem.
- *
- * Duplicate constant pointers are possible, and will have their lengths
- * marked as '-1', so that they are later ignored.
- *
- * If query_loc > 0, then "query" has been advanced by that much compared to
- * the original string start, so we need to translate the provided locations
- * to compensate. (This lets us avoid re-scanning statements before the one
- * of interest, so it's worth doing.)
- *
- * N.B. There is an assumption that a '-' character at a Const location begins
- * a negative numeric constant. This precludes there ever being another
- * reason for a constant to start with a '-'.
- */
-static void
-fill_in_constant_lengths(JumbleState *jstate, const char *query,
- int query_loc)
-{
- LocationLen *locs;
- core_yyscan_t yyscanner;
- core_yy_extra_type yyextra;
- core_YYSTYPE yylval;
- YYLTYPE yylloc;
- int last_loc = -1;
- int i;
-
- /*
- * Sort the records by location so that we can process them in order while
- * scanning the query text.
- */
- if (jstate->clocations_count > 1)
- qsort(jstate->clocations, jstate->clocations_count,
- sizeof(LocationLen), comp_location);
- locs = jstate->clocations;
-
- /* initialize the flex scanner --- should match raw_parser() */
- yyscanner = scanner_init(query,
- &yyextra,
- &ScanKeywords,
- ScanKeywordTokens);
-
- /* we don't want to re-emit any escape string warnings */
- yyextra.escape_string_warning = false;
-
- /* Search for each constant, in sequence */
- for (i = 0; i < jstate->clocations_count; i++)
- {
- int loc = locs[i].location;
- int tok;
- bool squashed = locs[i].squashed;
-
- /* Adjust recorded location if we're dealing with partial string */
- loc -= query_loc;
-
- Assert(loc >= 0);
-
- if (loc <= last_loc)
- {
- locs[i].length = -1;
- continue; /* Duplicate constant, ignore */
- }
-
- /* We have the next valid location, save the current location */
- last_loc = loc;
-
- if (squashed)
- continue; /* squashable list, ignore */
-
- /* Lex tokens until we find the desired constant */
- for (;;)
- {
- tok = core_yylex(&yylval, &yylloc, yyscanner);
-
- /* We should not hit end-of-string, but if we do, behave sanely */
- if (tok == 0)
- break; /* out of inner for-loop */
-
- /*
- * We should find the token position exactly, but if we somehow
- * run past it, work with that.
- */
- if (yylloc >= loc)
- {
- if (query[loc] == '-')
- {
- /*
- * It's a negative value - this is the one and only case
- * where we replace more than a single token.
- *
- * Do not compensate for the core system's special-case
- * adjustment of location to that of the leading '-'
- * operator in the event of a negative constant. It is
- * also useful for our purposes to start from the minus
- * symbol. In this way, queries like "select * from foo
- * where bar = 1" and "select * from foo where bar = -2"
- * will have identical normalized query strings.
- */
- tok = core_yylex(&yylval, &yylloc, yyscanner);
- if (tok == 0)
- break; /* out of inner for-loop */
- }
-
- /*
- * We now rely on the assumption that flex has placed a zero
- * byte after the text of the current token in scanbuf.
- */
- locs[i].length = strlen(yyextra.scanbuf + loc);
- break; /* out of inner for-loop */
- }
- }
-
- /* If we hit end-of-string, give up, leaving remaining lengths -1 */
- if (tok == 0)
- break;
- }
-
- scanner_finish(yyscanner);
-}
-
-/*
- * comp_location: comparator for qsorting LocationLen structs by location
- */
-static int
-comp_location(const void *a, const void *b)
-{
- int l = ((const LocationLen *) a)->location;
- int r = ((const LocationLen *) b)->location;
-
- return pg_cmp_s32(l, r);
-}
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 31f97151977..9244e1468f3 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -40,10 +40,12 @@
#include "access/transam.h"
#include "catalog/pg_proc.h"
#include "common/hashfn.h"
+#include "common/int.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "nodes/queryjumble.h"
#include "utils/lsyscache.h"
+#include "parser/scanner.h"
#include "parser/scansup.h"
#define JUMBLE_SIZE 1024 /* query serialization buffer size */
@@ -77,6 +79,262 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static int comp_location(const void *a, const void *b);
+
+/*
+ * comp_location: comparator for qsorting LocationLen structs by location
+ */
+static int
+comp_location(const void *a, const void *b)
+{
+ int l = ((const LocationLen *) a)->location;
+ int r = ((const LocationLen *) b)->location;
+
+ return pg_cmp_s32(l, r);
+}
+
+ /*
+ * Generate a normalized version of the query string that will be used to
+ * represent all similar queries.
+ *
+ * If query_loc > 0, then "query" has been advanced by that much compared to
+ * the original string start, so we need to translate the provided locations
+ * to compensate. (This lets us avoid re-scanning statements before the one
+ * of interest, so it's worth doing.)
+ *
+ * *query_len_p contains the input string length, and is updated with the
+ * result string length on exit. The resulting string might be longer or
+ * shorter depending on what happens with replacement of constants.
+ *
+ * Returns a palloc'd string.
+ */
+char *
+generate_normalized_query(JumbleState *jstate, const char *query,
+ int query_loc, int *query_len_p)
+{
+ char *norm_query;
+ int query_len = *query_len_p;
+ int norm_query_buflen, /* Space allowed for norm_query */
+ len_to_wrt, /* Length (in bytes) to write */
+ quer_loc = 0, /* Source query byte location */
+ n_quer_loc = 0, /* Normalized query byte location */
+ last_off = 0, /* Offset from start for previous tok */
+ last_tok_len = 0; /* Length (in bytes) of that tok */
+ int num_constants_replaced = 0;
+
+ /*
+ * Get constants' lengths (core system only gives us locations). Note
+ * this also ensures the items are sorted by location.
+ */
+ fill_in_constant_lengths(jstate, query, query_loc);
+
+ /*
+ * Allow for $n symbols to be longer than the constants they replace.
+ * Constants must take at least one byte in text form, while a $n symbol
+ * certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
+ * could refine that limit based on the max value of n for the current
+ * query, but it hardly seems worth any extra effort to do so.
+ */
+ norm_query_buflen = query_len + jstate->clocations_count * 10;
+
+ /* Allocate result buffer */
+ norm_query = palloc(norm_query_buflen + 1);
+
+ for (int i = 0; i < jstate->clocations_count; i++)
+ {
+ int off, /* Offset from start for cur tok */
+ tok_len; /* Length (in bytes) of that tok */
+
+ /*
+ * If we have an external param at this location, but no lists are
+ * being squashed across the query, then we skip here; this will make
+ * us print the characters found in the original query that represent
+ * the parameter in the next iteration (or after the loop is done),
+ * which is a bit odd but seems to work okay in most cases.
+ */
+ if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
+ continue;
+
+ off = jstate->clocations[i].location;
+
+ /* Adjust recorded location if we're dealing with partial string */
+ off -= query_loc;
+
+ tok_len = jstate->clocations[i].length;
+
+ if (tok_len < 0)
+ continue; /* ignore any duplicates */
+
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
+ /*
+ * And insert a param symbol in place of the constant token; and, if
+ * we have a squashable list, insert a placeholder comment starting
+ * from the list's second value.
+ */
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced + 1 + jstate->highest_extern_param_id,
+ jstate->clocations[i].squashed ? " /*, ... */" : "");
+ num_constants_replaced++;
+
+ /* move forward */
+ quer_loc = off + tok_len;
+ last_off = off;
+ last_tok_len = tok_len;
+ }
+
+ /*
+ * We've copied up until the last ignorable constant. Copy over the
+ * remaining bytes of the original query string.
+ */
+ len_to_wrt = query_len - quer_loc;
+
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
+ Assert(n_quer_loc <= norm_query_buflen);
+ norm_query[n_quer_loc] = '\0';
+
+ *query_len_p = n_quer_loc;
+ return norm_query;
+}
+
+/*
+ * Given a valid SQL string and an array of constant-location records,
+ * fill in the textual lengths of those constants.
+ *
+ * The constants may use any allowed constant syntax, such as float literals,
+ * bit-strings, single-quoted strings and dollar-quoted strings. This is
+ * accomplished by using the public API for the core scanner.
+ *
+ * It is the caller's job to ensure that the string is a valid SQL statement
+ * with constants at the indicated locations. Since in practice the string
+ * has already been parsed, and the locations that the caller provides will
+ * have originated from within the authoritative parser, this should not be
+ * a problem.
+ *
+ * Duplicate constant pointers are possible, and will have their lengths
+ * marked as '-1', so that they are later ignored.
+ *
+ * If query_loc > 0, then "query" has been advanced by that much compared to
+ * the original string start, so we need to translate the provided locations
+ * to compensate. (This lets us avoid re-scanning statements before the one
+ * of interest, so it's worth doing.)
+ *
+ * N.B. There is an assumption that a '-' character at a Const location begins
+ * a negative numeric constant. This precludes there ever being another
+ * reason for a constant to start with a '-'.
+ */
+void
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
+ int query_loc)
+{
+ LocationLen *locs;
+ core_yyscan_t yyscanner;
+ core_yy_extra_type yyextra;
+ core_YYSTYPE yylval;
+ YYLTYPE yylloc;
+ int last_loc = -1;
+ int i;
+
+ /*
+ * Sort the records by location so that we can process them in order while
+ * scanning the query text.
+ */
+ if (jstate->clocations_count > 1)
+ qsort(jstate->clocations, jstate->clocations_count,
+ sizeof(LocationLen), comp_location);
+ locs = jstate->clocations;
+
+ /* initialize the flex scanner --- should match raw_parser() */
+ yyscanner = scanner_init(query,
+ &yyextra,
+ &ScanKeywords,
+ ScanKeywordTokens);
+
+ /* we don't want to re-emit any escape string warnings */
+ yyextra.escape_string_warning = false;
+
+ /* Search for each constant, in sequence */
+ for (i = 0; i < jstate->clocations_count; i++)
+ {
+ int loc = locs[i].location;
+ int tok;
+ bool squashed = locs[i].squashed;
+
+ /* Adjust recorded location if we're dealing with partial string */
+ loc -= query_loc;
+
+ Assert(loc >= 0);
+
+ if (loc <= last_loc)
+ {
+ locs[i].length = -1;
+ continue; /* Duplicate constant, ignore */
+ }
+
+ /* We have the next valid location, save the current location */
+ last_loc = loc;
+
+ if (squashed)
+ continue; /* squashable list, ignore */
+
+ /* Lex tokens until we find the desired constant */
+ for (;;)
+ {
+ tok = core_yylex(&yylval, &yylloc, yyscanner);
+
+ /* We should not hit end-of-string, but if we do, behave sanely */
+ if (tok == 0)
+ break; /* out of inner for-loop */
+
+ /*
+ * We should find the token position exactly, but if we somehow
+ * run past it, work with that.
+ */
+ if (yylloc >= loc)
+ {
+ if (query[loc] == '-')
+ {
+ /*
+ * It's a negative value - this is the one and only case
+ * where we replace more than a single token.
+ *
+ * Do not compensate for the core system's special-case
+ * adjustment of location to that of the leading '-'
+ * operator in the event of a negative constant. It is
+ * also useful for our purposes to start from the minus
+ * symbol. In this way, queries like "select * from foo
+ * where bar = 1" and "select * from foo where bar = -2"
+ * will have identical normalized query strings.
+ */
+ tok = core_yylex(&yylval, &yylloc, yyscanner);
+ if (tok == 0)
+ break; /* out of inner for-loop */
+ }
+
+ /*
+ * We now rely on the assumption that flex has placed a zero
+ * byte after the text of the current token in scanbuf.
+ */
+ locs[i].length = strlen(yyextra.scanbuf + loc);
+ break; /* out of inner for-loop */
+ }
+ }
+
+ /* If we hit end-of-string, give up, leaving remaining lengths -1 */
+ if (tok == 0)
+ break;
+ }
+
+ scanner_finish(yyscanner);
+}
/*
* Given a possibly multi-statement source string, confine our attention to the
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index dcb36dcb44f..8024a0883f7 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -91,6 +91,10 @@ extern PGDLLIMPORT int compute_query_id;
extern const char *CleanQuerytext(const char *query, int *location, int *len);
+extern char *generate_normalized_query(JumbleState *jstate, const char *query,
+ int query_loc, int *query_len_p);
+extern void fill_in_constant_lengths(JumbleState *jstate, const char *query,
+ int query_loc);
extern JumbleState *JumbleQuery(Query *query);
extern void EnableQueryId(void);
--
2.43.0
v4 corrects some code comments.
Also, I created a CF entry [0]https://commitfest.postgresql.org/patch/6167/.
Attachments:
v4-0001-pg_stat_statements-Fix-duplicate-constant-locatio.patchapplication/octet-stream; name=v4-0001-pg_stat_statements-Fix-duplicate-constant-locatio.patchDownload
From dc4be5e3b9836a758bf0d2ef12ec7d88f0c92bb1 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 19:12:01 +0000
Subject: [PATCH v4 1/2] pg_stat_statements: Fix duplicate constant locations
caused by composite row expansion
Commit 0f65f3eec introduced squashing of constant lists, but failed to
handle row expansion of composite values correctly. This could result
in the same location being recorded multiple times, which led to
out-of-bounds memory access in generate_normalized_query().
This patch updates fill_in_constant_lengths() to check for squashed
constants only after skipping duplicates, and to correctly update
last_loc. Each location is now processed once while squashable
constants are still ignored.
An oversight in 0f65f3eec also meant duplicate locations were not set
to -1. Before 0f65f3eec, RecordConstLocation always set the length to
-1 and delegated fill_in_constant_lengths() to populate the lengths.
Commit 0f65f3eec changed this, since a squashed list length is set
during RecordConstLocation.
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
---
.../pg_stat_statements/expected/squashing.out | 80 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 18 +++--
contrib/pg_stat_statements/sql/squashing.sql | 26 ++++++
3 files changed, 117 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f952f47ef7b..d5bb67c7222 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -809,6 +809,84 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
select where $1 IN ($2 /*, ... */) | 2
(2 rows)
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+ x | y | ?column? | ?column? | ?column? | x | y | ?column? | ?column?
+---+---+----------+----------+----------+---+---+----------+----------
+(0 rows)
+
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+ x | y | ?column?
+---+---+----------
+(0 rows)
+
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+ f1
+-------
+ {1,2}
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+ f1 | f2
+-------+---------
+ {1,2} | {1,2,3}
+(1 row)
+
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+ ?column? | ?column? | f1 | f2 | ?column? | ?column?
+----------+----------+-------+---------+----------+----------
+ 1 | 2 | {1,2} | {1,2,3} | 3 | 4
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+ f1 | f2 | ?column?
+-------+---------+----------
+ {1,2} | {1,1,3} | 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------------------------------------------+-------
+ SELECT $1, $2, (ROW(ARRAY[$3 /*, ... */], ARRAY[$4 /*, ... */])).*, $5, $6 | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).* FROM test_composite | 2
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2 FROM test_composite | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2, $3, $4, ((composite_f(array[$5 /*, ... */]))).*, $6, $7+| 1
+ FROM test_composite +|
+ WHERE x IN ($8 /*, ... */) |
+ SELECT (ROW(ARRAY[$1 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).*, $3 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(8 rows)
+
--
-- cleanup
--
@@ -818,3 +896,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f2187167c5c..1d22dc07da9 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2955,8 +2955,7 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* a problem.
*
* Duplicate constant pointers are possible, and will have their lengths
- * marked as '-1', so that they are later ignored. (Actually, we assume the
- * lengths were initialized as -1 to start with, and don't change them here.)
+ * marked as '-1', so that they are later ignored.
*
* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, so we need to translate the provided locations
@@ -3002,17 +3001,24 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
{
int loc = locs[i].location;
int tok;
+ bool squashed = locs[i].squashed;
/* Adjust recorded location if we're dealing with partial string */
loc -= query_loc;
Assert(loc >= 0);
- if (locs[i].squashed)
- continue; /* squashable list, ignore */
-
if (loc <= last_loc)
+ {
+ locs[i].length = -1;
continue; /* Duplicate constant, ignore */
+ }
+
+ /* We have a valid location, so let's save it */
+ last_loc = loc;
+
+ if (squashed)
+ continue; /* squashable list, ignore */
/* Lex tokens until we find the desired constant */
for (;;)
@@ -3060,8 +3066,6 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
/* If we hit end-of-string, give up, leaving remaining lengths -1 */
if (tok == 0)
break;
-
- last_loc = loc;
}
scanner_finish(yyscanner);
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 53138d125a9..03b0515f872 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -291,6 +291,30 @@ select where '1' IN ('1'::int::text, '2'::int::text);
select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- cleanup
--
@@ -300,3 +324,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
--
2.43.0
v4-0002-pg_stat_statements-move-out-jumble-specific-routi.patchapplication/octet-stream; name=v4-0002-pg_stat_statements-move-out-jumble-specific-routi.patchDownload
From 956c0a456a149f0e687d118b8c6ddad8e42df828 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@ip-172-31-46-230.ec2.internal>
Date: Fri, 24 Oct 2025 20:42:30 +0000
Subject: [PATCH v4 2/2] pg_stat_statements: move out jumble-specific routines
During discussion of commit 8eddb06, it was noted that the
fill_in_constant_lengths and generate_normalized_query routines
depend on intricate jumbling knowledge. Implementing this code
within pg_stat_statements does not make sense; it belongs in the
core jumbling module, queryjumblefuncs.c. pg_stat_statements can
then consume these routines without needing to know the details.
This separation became especially apparent with the squashing of
constant lists, but it has been a general issue all along.
In addition to reducing the pg_stat_statements codebase, this
change makes these routines available to other extensions
performing similar operations.
Discussion: https://www.postgresql.org/message-id/CAA5RZ0t6rynOzvABTbHZPF9ydKoay5LN%2B_iM3hHk0ni_Qu_9tg%40mail.gmail.com
---
.../pg_stat_statements/pg_stat_statements.c | 274 +-----------------
src/backend/nodes/queryjumblefuncs.c | 258 +++++++++++++++++
src/include/nodes/queryjumble.h | 4 +
3 files changed, 269 insertions(+), 267 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1d22dc07da9..95293d3b594 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,7 +50,6 @@
#include "access/htup_details.h"
#include "access/parallel.h"
#include "catalog/pg_authid.h"
-#include "common/int.h"
#include "executor/instrument.h"
#include "funcapi.h"
#include "jit/jit.h"
@@ -59,7 +58,6 @@
#include "nodes/queryjumble.h"
#include "optimizer/planner.h"
#include "parser/analyze.h"
-#include "parser/scanner.h"
#include "pgstat.h"
#include "storage/fd.h"
#include "storage/ipc.h"
@@ -377,12 +375,6 @@ static char *qtext_fetch(Size query_offset, int query_len,
static bool need_gc_qtexts(void);
static void gc_qtexts(void);
static TimestampTz entry_reset(Oid userid, Oid dbid, int64 queryid, bool minmax_only);
-static char *generate_normalized_query(JumbleState *jstate, const char *query,
- int query_loc, int *query_len_p);
-static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
- int query_loc);
-static int comp_location(const void *a, const void *b);
-
/*
* Module load callback
@@ -1359,6 +1351,13 @@ pgss_store(const char *query, int64 queryId,
if (jstate)
{
LWLockRelease(pgss->lock);
+
+ /*
+ * generate the normalized query. Note that the normalized
+ * representation may well vary depending on just which
+ * "equivalent" query is used to create the hashtable entry. We
+ * assume this is OK.
+ */
norm_query = generate_normalized_query(jstate, query,
query_location,
&query_len);
@@ -2823,262 +2822,3 @@ release_lock:
return stats_reset;
}
-
-/*
- * Generate a normalized version of the query string that will be used to
- * represent all similar queries.
- *
- * Note that the normalized representation may well vary depending on
- * just which "equivalent" query is used to create the hashtable entry.
- * We assume this is OK.
- *
- * If query_loc > 0, then "query" has been advanced by that much compared to
- * the original string start, so we need to translate the provided locations
- * to compensate. (This lets us avoid re-scanning statements before the one
- * of interest, so it's worth doing.)
- *
- * *query_len_p contains the input string length, and is updated with
- * the result string length on exit. The resulting string might be longer
- * or shorter depending on what happens with replacement of constants.
- *
- * Returns a palloc'd string.
- */
-static char *
-generate_normalized_query(JumbleState *jstate, const char *query,
- int query_loc, int *query_len_p)
-{
- char *norm_query;
- int query_len = *query_len_p;
- int norm_query_buflen, /* Space allowed for norm_query */
- len_to_wrt, /* Length (in bytes) to write */
- quer_loc = 0, /* Source query byte location */
- n_quer_loc = 0, /* Normalized query byte location */
- last_off = 0, /* Offset from start for previous tok */
- last_tok_len = 0; /* Length (in bytes) of that tok */
- int num_constants_replaced = 0;
-
- /*
- * Get constants' lengths (core system only gives us locations). Note
- * this also ensures the items are sorted by location.
- */
- fill_in_constant_lengths(jstate, query, query_loc);
-
- /*
- * Allow for $n symbols to be longer than the constants they replace.
- * Constants must take at least one byte in text form, while a $n symbol
- * certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
- * could refine that limit based on the max value of n for the current
- * query, but it hardly seems worth any extra effort to do so.
- */
- norm_query_buflen = query_len + jstate->clocations_count * 10;
-
- /* Allocate result buffer */
- norm_query = palloc(norm_query_buflen + 1);
-
- for (int i = 0; i < jstate->clocations_count; i++)
- {
- int off, /* Offset from start for cur tok */
- tok_len; /* Length (in bytes) of that tok */
-
- /*
- * If we have an external param at this location, but no lists are
- * being squashed across the query, then we skip here; this will make
- * us print the characters found in the original query that represent
- * the parameter in the next iteration (or after the loop is done),
- * which is a bit odd but seems to work okay in most cases.
- */
- if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
- continue;
-
- off = jstate->clocations[i].location;
-
- /* Adjust recorded location if we're dealing with partial string */
- off -= query_loc;
-
- tok_len = jstate->clocations[i].length;
-
- if (tok_len < 0)
- continue; /* ignore any duplicates */
-
- /* Copy next chunk (what precedes the next constant) */
- len_to_wrt = off - last_off;
- len_to_wrt -= last_tok_len;
- Assert(len_to_wrt >= 0);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- /*
- * And insert a param symbol in place of the constant token; and, if
- * we have a squashable list, insert a placeholder comment starting
- * from the list's second value.
- */
- n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
- num_constants_replaced + 1 + jstate->highest_extern_param_id,
- jstate->clocations[i].squashed ? " /*, ... */" : "");
- num_constants_replaced++;
-
- /* move forward */
- quer_loc = off + tok_len;
- last_off = off;
- last_tok_len = tok_len;
- }
-
- /*
- * We've copied up until the last ignorable constant. Copy over the
- * remaining bytes of the original query string.
- */
- len_to_wrt = query_len - quer_loc;
-
- Assert(len_to_wrt >= 0);
- memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
- n_quer_loc += len_to_wrt;
-
- Assert(n_quer_loc <= norm_query_buflen);
- norm_query[n_quer_loc] = '\0';
-
- *query_len_p = n_quer_loc;
- return norm_query;
-}
-
-/*
- * Given a valid SQL string and an array of constant-location records,
- * fill in the textual lengths of those constants.
- *
- * The constants may use any allowed constant syntax, such as float literals,
- * bit-strings, single-quoted strings and dollar-quoted strings. This is
- * accomplished by using the public API for the core scanner.
- *
- * It is the caller's job to ensure that the string is a valid SQL statement
- * with constants at the indicated locations. Since in practice the string
- * has already been parsed, and the locations that the caller provides will
- * have originated from within the authoritative parser, this should not be
- * a problem.
- *
- * Duplicate constant pointers are possible, and will have their lengths
- * marked as '-1', so that they are later ignored.
- *
- * If query_loc > 0, then "query" has been advanced by that much compared to
- * the original string start, so we need to translate the provided locations
- * to compensate. (This lets us avoid re-scanning statements before the one
- * of interest, so it's worth doing.)
- *
- * N.B. There is an assumption that a '-' character at a Const location begins
- * a negative numeric constant. This precludes there ever being another
- * reason for a constant to start with a '-'.
- */
-static void
-fill_in_constant_lengths(JumbleState *jstate, const char *query,
- int query_loc)
-{
- LocationLen *locs;
- core_yyscan_t yyscanner;
- core_yy_extra_type yyextra;
- core_YYSTYPE yylval;
- YYLTYPE yylloc;
- int last_loc = -1;
- int i;
-
- /*
- * Sort the records by location so that we can process them in order while
- * scanning the query text.
- */
- if (jstate->clocations_count > 1)
- qsort(jstate->clocations, jstate->clocations_count,
- sizeof(LocationLen), comp_location);
- locs = jstate->clocations;
-
- /* initialize the flex scanner --- should match raw_parser() */
- yyscanner = scanner_init(query,
- &yyextra,
- &ScanKeywords,
- ScanKeywordTokens);
-
- /* we don't want to re-emit any escape string warnings */
- yyextra.escape_string_warning = false;
-
- /* Search for each constant, in sequence */
- for (i = 0; i < jstate->clocations_count; i++)
- {
- int loc = locs[i].location;
- int tok;
- bool squashed = locs[i].squashed;
-
- /* Adjust recorded location if we're dealing with partial string */
- loc -= query_loc;
-
- Assert(loc >= 0);
-
- if (loc <= last_loc)
- {
- locs[i].length = -1;
- continue; /* Duplicate constant, ignore */
- }
-
- /* We have a valid location, so let's save it */
- last_loc = loc;
-
- if (squashed)
- continue; /* squashable list, ignore */
-
- /* Lex tokens until we find the desired constant */
- for (;;)
- {
- tok = core_yylex(&yylval, &yylloc, yyscanner);
-
- /* We should not hit end-of-string, but if we do, behave sanely */
- if (tok == 0)
- break; /* out of inner for-loop */
-
- /*
- * We should find the token position exactly, but if we somehow
- * run past it, work with that.
- */
- if (yylloc >= loc)
- {
- if (query[loc] == '-')
- {
- /*
- * It's a negative value - this is the one and only case
- * where we replace more than a single token.
- *
- * Do not compensate for the core system's special-case
- * adjustment of location to that of the leading '-'
- * operator in the event of a negative constant. It is
- * also useful for our purposes to start from the minus
- * symbol. In this way, queries like "select * from foo
- * where bar = 1" and "select * from foo where bar = -2"
- * will have identical normalized query strings.
- */
- tok = core_yylex(&yylval, &yylloc, yyscanner);
- if (tok == 0)
- break; /* out of inner for-loop */
- }
-
- /*
- * We now rely on the assumption that flex has placed a zero
- * byte after the text of the current token in scanbuf.
- */
- locs[i].length = strlen(yyextra.scanbuf + loc);
- break; /* out of inner for-loop */
- }
- }
-
- /* If we hit end-of-string, give up, leaving remaining lengths -1 */
- if (tok == 0)
- break;
- }
-
- scanner_finish(yyscanner);
-}
-
-/*
- * comp_location: comparator for qsorting LocationLen structs by location
- */
-static int
-comp_location(const void *a, const void *b)
-{
- int l = ((const LocationLen *) a)->location;
- int r = ((const LocationLen *) b)->location;
-
- return pg_cmp_s32(l, r);
-}
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 31f97151977..c27f1c12ac7 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -40,10 +40,12 @@
#include "access/transam.h"
#include "catalog/pg_proc.h"
#include "common/hashfn.h"
+#include "common/int.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "nodes/queryjumble.h"
#include "utils/lsyscache.h"
+#include "parser/scanner.h"
#include "parser/scansup.h"
#define JUMBLE_SIZE 1024 /* query serialization buffer size */
@@ -77,6 +79,262 @@ static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
RangeTblEntry *rte,
Alias *expr);
+static int comp_location(const void *a, const void *b);
+
+/*
+ * comp_location: comparator for qsorting LocationLen structs by location
+ */
+static int
+comp_location(const void *a, const void *b)
+{
+ int l = ((const LocationLen *) a)->location;
+ int r = ((const LocationLen *) b)->location;
+
+ return pg_cmp_s32(l, r);
+}
+
+ /*
+ * Generate a normalized version of the query string that will be used to
+ * represent all similar queries.
+ *
+ * If query_loc > 0, then "query" has been advanced by that much compared to
+ * the original string start, so we need to translate the provided locations
+ * to compensate. (This lets us avoid re-scanning statements before the one
+ * of interest, so it's worth doing.)
+ *
+ * *query_len_p contains the input string length, and is updated with the
+ * result string length on exit. The resulting string might be longer or
+ * shorter depending on what happens with replacement of constants.
+ *
+ * Returns a palloc'd string.
+ */
+char *
+generate_normalized_query(JumbleState *jstate, const char *query,
+ int query_loc, int *query_len_p)
+{
+ char *norm_query;
+ int query_len = *query_len_p;
+ int norm_query_buflen, /* Space allowed for norm_query */
+ len_to_wrt, /* Length (in bytes) to write */
+ quer_loc = 0, /* Source query byte location */
+ n_quer_loc = 0, /* Normalized query byte location */
+ last_off = 0, /* Offset from start for previous tok */
+ last_tok_len = 0; /* Length (in bytes) of that tok */
+ int num_constants_replaced = 0;
+
+ /*
+ * Get constants' lengths (core system only gives us locations). Note
+ * this also ensures the items are sorted by location.
+ */
+ fill_in_constant_lengths(jstate, query, query_loc);
+
+ /*
+ * Allow for $n symbols to be longer than the constants they replace.
+ * Constants must take at least one byte in text form, while a $n symbol
+ * certainly isn't more than 11 bytes, even if n reaches INT_MAX. We
+ * could refine that limit based on the max value of n for the current
+ * query, but it hardly seems worth any extra effort to do so.
+ */
+ norm_query_buflen = query_len + jstate->clocations_count * 10;
+
+ /* Allocate result buffer */
+ norm_query = palloc(norm_query_buflen + 1);
+
+ for (int i = 0; i < jstate->clocations_count; i++)
+ {
+ int off, /* Offset from start for cur tok */
+ tok_len; /* Length (in bytes) of that tok */
+
+ /*
+ * If we have an external param at this location, but no lists are
+ * being squashed across the query, then we skip here; this will make
+ * us print the characters found in the original query that represent
+ * the parameter in the next iteration (or after the loop is done),
+ * which is a bit odd but seems to work okay in most cases.
+ */
+ if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
+ continue;
+
+ off = jstate->clocations[i].location;
+
+ /* Adjust recorded location if we're dealing with partial string */
+ off -= query_loc;
+
+ tok_len = jstate->clocations[i].length;
+
+ if (tok_len < 0)
+ continue; /* ignore any duplicates */
+
+ /* Copy next chunk (what precedes the next constant) */
+ len_to_wrt = off - last_off;
+ len_to_wrt -= last_tok_len;
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
+ /*
+ * And insert a param symbol in place of the constant token; and, if
+ * we have a squashable list, insert a placeholder comment starting
+ * from the list's second value.
+ */
+ n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
+ num_constants_replaced + 1 + jstate->highest_extern_param_id,
+ jstate->clocations[i].squashed ? " /*, ... */" : "");
+ num_constants_replaced++;
+
+ /* move forward */
+ quer_loc = off + tok_len;
+ last_off = off;
+ last_tok_len = tok_len;
+ }
+
+ /*
+ * We've copied up until the last ignorable constant. Copy over the
+ * remaining bytes of the original query string.
+ */
+ len_to_wrt = query_len - quer_loc;
+
+ Assert(len_to_wrt >= 0);
+ memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+ n_quer_loc += len_to_wrt;
+
+ Assert(n_quer_loc <= norm_query_buflen);
+ norm_query[n_quer_loc] = '\0';
+
+ *query_len_p = n_quer_loc;
+ return norm_query;
+}
+
+/*
+ * Given a valid SQL string and an array of constant-location records,
+ * fill in the textual lengths of those constants.
+ *
+ * The constants may use any allowed constant syntax, such as float literals,
+ * bit-strings, single-quoted strings and dollar-quoted strings. This is
+ * accomplished by using the public API for the core scanner.
+ *
+ * It is the caller's job to ensure that the string is a valid SQL statement
+ * with constants at the indicated locations. Since in practice the string
+ * has already been parsed, and the locations that the caller provides will
+ * have originated from within the authoritative parser, this should not be
+ * a problem.
+ *
+ * Duplicate constant pointers are possible, and will have their lengths
+ * marked as '-1', so that they are later ignored.
+ *
+ * If query_loc > 0, then "query" has been advanced by that much compared to
+ * the original string start, so we need to translate the provided locations
+ * to compensate. (This lets us avoid re-scanning statements before the one
+ * of interest, so it's worth doing.)
+ *
+ * N.B. There is an assumption that a '-' character at a Const location begins
+ * a negative numeric constant. This precludes there ever being another
+ * reason for a constant to start with a '-'.
+ */
+void
+fill_in_constant_lengths(JumbleState *jstate, const char *query,
+ int query_loc)
+{
+ LocationLen *locs;
+ core_yyscan_t yyscanner;
+ core_yy_extra_type yyextra;
+ core_YYSTYPE yylval;
+ YYLTYPE yylloc;
+ int last_loc = -1;
+ int i;
+
+ /*
+ * Sort the records by location so that we can process them in order while
+ * scanning the query text.
+ */
+ if (jstate->clocations_count > 1)
+ qsort(jstate->clocations, jstate->clocations_count,
+ sizeof(LocationLen), comp_location);
+ locs = jstate->clocations;
+
+ /* initialize the flex scanner --- should match raw_parser() */
+ yyscanner = scanner_init(query,
+ &yyextra,
+ &ScanKeywords,
+ ScanKeywordTokens);
+
+ /* we don't want to re-emit any escape string warnings */
+ yyextra.escape_string_warning = false;
+
+ /* Search for each constant, in sequence */
+ for (i = 0; i < jstate->clocations_count; i++)
+ {
+ int loc = locs[i].location;
+ int tok;
+ bool squashed = locs[i].squashed;
+
+ /* Adjust recorded location if we're dealing with partial string */
+ loc -= query_loc;
+
+ Assert(loc >= 0);
+
+ if (loc <= last_loc)
+ {
+ locs[i].length = -1;
+ continue; /* Duplicate constant, ignore */
+ }
+
+ /* We have a valid location, so let's save it */
+ last_loc = loc;
+
+ if (squashed)
+ continue; /* squashable list, ignore */
+
+ /* Lex tokens until we find the desired constant */
+ for (;;)
+ {
+ tok = core_yylex(&yylval, &yylloc, yyscanner);
+
+ /* We should not hit end-of-string, but if we do, behave sanely */
+ if (tok == 0)
+ break; /* out of inner for-loop */
+
+ /*
+ * We should find the token position exactly, but if we somehow
+ * run past it, work with that.
+ */
+ if (yylloc >= loc)
+ {
+ if (query[loc] == '-')
+ {
+ /*
+ * It's a negative value - this is the one and only case
+ * where we replace more than a single token.
+ *
+ * Do not compensate for the core system's special-case
+ * adjustment of location to that of the leading '-'
+ * operator in the event of a negative constant. It is
+ * also useful for our purposes to start from the minus
+ * symbol. In this way, queries like "select * from foo
+ * where bar = 1" and "select * from foo where bar = -2"
+ * will have identical normalized query strings.
+ */
+ tok = core_yylex(&yylval, &yylloc, yyscanner);
+ if (tok == 0)
+ break; /* out of inner for-loop */
+ }
+
+ /*
+ * We now rely on the assumption that flex has placed a zero
+ * byte after the text of the current token in scanbuf.
+ */
+ locs[i].length = strlen(yyextra.scanbuf + loc);
+ break; /* out of inner for-loop */
+ }
+ }
+
+ /* If we hit end-of-string, give up, leaving remaining lengths -1 */
+ if (tok == 0)
+ break;
+ }
+
+ scanner_finish(yyscanner);
+}
/*
* Given a possibly multi-statement source string, confine our attention to the
diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h
index dcb36dcb44f..8024a0883f7 100644
--- a/src/include/nodes/queryjumble.h
+++ b/src/include/nodes/queryjumble.h
@@ -91,6 +91,10 @@ extern PGDLLIMPORT int compute_query_id;
extern const char *CleanQuerytext(const char *query, int *location, int *len);
+extern char *generate_normalized_query(JumbleState *jstate, const char *query,
+ int query_loc, int *query_len_p);
+extern void fill_in_constant_lengths(JumbleState *jstate, const char *query,
+ int query_loc);
extern JumbleState *JumbleQuery(Query *query);
extern void EnableQueryId(void);
--
2.43.0
On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote:
v4 corrects some code comments.
The fix in the first patch looks good, thanks. Not so sure about the
refactoring in the second patch -- generate_normalized_query is being
used only in one place, in pg_stat_statements, moving it out somewhere
else without having other clients seems to be questionable to me.
P.S. Adding �lvaro as a commiter of the affected feature, maybe he will
help us to apply the fix.
Thanks for reviewing!
Not so sure about the
refactoring in the second patch -- generate_normalized_query is being
used only in one place, in pg_stat_statements, moving it out somewhere
else without having other clients seems to be questionable to me.
The excellent question raised earlier is whether the fix should be
applied in pg_stat_statements.c or queryjumblefuncs.c. To me, this
suggests that pg_stat_statements, as an extension, is dealing with code
it should not be responsible for.
generate_normalized_query could be used by other extensions that have
normalization needs; it’s generic and needs to handle squashing, which
is a jumble specific detail.
Therefore, I think both fill_in_constant_lengths and
generate_normalized_query should be moved. We can certainly start a new
thread (0002) if more discussion is needed.
--
Sami Imseih
Amazon Web Services (AWS)
On Mon, Oct 27, 2025 at 02:34:50PM -0500, Sami Imseih wrote:
The excellent question raised earlier is whether the fix should be
applied in pg_stat_statements.c or queryjumblefuncs.c. To me, this
suggests that pg_stat_statements, as an extension, is dealing with code
it should not be responsible for.generate_normalized_query could be used by other extensions that have
normalization needs; it’s generic and needs to handle squashing, which
is a jumble specific detail.
FWIW, the line defining the separation between pg_stat_statements.c
and queryjumblefuncs.c should rely on a pretty simple concept:
JumbleState can be written only by queryjumblefuncs.c and
src/backend/nodes/, and should remain in an untouched constant state
when looked at by any external module loaded, included PGSS, because
it could be shared across more than one paths.
That's just to say that I can get behind the argument you are
presenting in patch 0002, to move the updates of
JumbleState.clocations[N].length to happen inside the query jumbling
code. This relies on a query string anyway, which would be the same
for all the modules interested in interacting in a planner, analyze or
execution hook.
Now, I don't think that 0002 goes far enough: could it be possible to
do things so as we enforce a policy where modules cannot touch a
JumbleState at all? That would mean painting more consts to prevent
manipulations of the Jumble state.
Therefore, I think both fill_in_constant_lengths and
generate_normalized_query should be moved. We can certainly start a new
thread (0002) if more discussion is needed.
Yeah, that should be a separate discussion. Let's focus on the bug at
hand, first, then consider improvements that could be done moving
forward.
--
Michael
On 2025-Oct-28, Michael Paquier wrote:
FWIW, the line defining the separation between pg_stat_statements.c
and queryjumblefuncs.c should rely on a pretty simple concept:
JumbleState can be written only by queryjumblefuncs.c and
src/backend/nodes/, and should remain in an untouched constant state
when looked at by any external module loaded, included PGSS, because
it could be shared across more than one paths.
I can get behind this as a principle.
That's just to say that I can get behind the argument you are
presenting in patch 0002, to move the updates of
JumbleState.clocations[N].length to happen inside the query jumbling
code. This relies on a query string anyway, which would be the same
for all the modules interested in interacting in a planner, analyze or
execution hook.
Hmm, yeah, now that you mention this, it seems rather odd that
pgss_store() is messing with (modifying) the jstate it is given. If we
had multiple modules using a planner_hook to interact with the query
representation, what each module sees would be different depending on
which hook is called first, which sounds wrong. Maybe, as you say, we
do need to consider this a design flaw that should be fixed in a more
principled fashion ... and it's pretty hard to see that we could
backpatch any such fixes. It's a pretty old issue though (you probably
notice I hesitate to call it a bug.) So I agree with you: we should fix
the specific bug first across branches, and the reworking of the
jumbling framework should be done afterwards in master only.
I'm going to get the first patch pushed, after looking at it more
carefully.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
https://twitter.com/gunnarmorling/status/1596080409259003906
On 2025-Oct-26, Dmitry Dolgov wrote:
On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote:
v4 corrects some code comments.The fix in the first patch looks good, thanks.
Yeah, I think this general idea is sensible. However, I think we should
take it one step further and just remove last_loc entirely. I think
this makes the code a bit clearer. How about the attached?
Regarding 0002, as I said on my reply to Michael I think this is a good
idea on principle, but I suggest to discuss that in a separate thread.
I would seek routine names that match the current ones in queryjumble.h
a little better though.
P.S. Adding Álvaro as a commiter of the affected feature, maybe he will
help us to apply the fix.
Thanks for doing that, I'd have not noticed the thread otherwise.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Pero la cosa no es muy grave ..." (le petit Nicolas -- René Goscinny)
Attachments:
v5-0001-pg_stat_statements-Fix-handling-of-duplicate-cons.patchtext/x-diff; charset=utf-8Download
From 72ce1762eb215ce5e047d6c470cfc24e0752e2c2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 28 Oct 2025 11:48:12 +0100
Subject: [PATCH v5] pg_stat_statements: Fix handling of duplicate constant
locations
Two or more constants can have the same location. We handled this
correctly for non squashed constants, but failed to do it if squashed
(resulting in out-of-bounds memory access), because the code structure
became broken by commit 0f65f3eec478: we failed to update 'last_loc'
correctly when skipping these squashed constants.
The simplest fix seems to be to get rid of 'last_loc' altogether -- in
hindsight, it's quite pointless. Also, when ignoring a constant because
of this, make sure to fulfill fill_in_constant_lengths's duty of setting
its length to -1.
Lastly, we can use == instead of <= because the locations have been
sorted beforehand, so the < case cannot arise.
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
---
.../pg_stat_statements/expected/squashing.out | 80 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 31 +++----
contrib/pg_stat_statements/sql/squashing.sql | 26 ++++++
3 files changed, 122 insertions(+), 15 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out
index f952f47ef7b..d5bb67c7222 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -809,6 +809,84 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
select where $1 IN ($2 /*, ... */) | 2
(2 rows)
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+ x | y
+---+---
+(0 rows)
+
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+ x | y | ?column? | ?column? | ?column? | x | y | ?column? | ?column?
+---+---+----------+----------+----------+---+---+----------+----------
+(0 rows)
+
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+ x | y | ?column?
+---+---+----------
+(0 rows)
+
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+ f1
+-------
+ {1,2}
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+ f1 | f2
+-------+---------
+ {1,2} | {1,2,3}
+(1 row)
+
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+ ?column? | ?column? | f1 | f2 | ?column? | ?column?
+----------+----------+-------+---------+----------+----------
+ 1 | 2 | {1,2} | {1,2,3} | 3 | 4
+(1 row)
+
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+ f1 | f2 | ?column?
+-------+---------+----------
+ {1,2} | {1,1,3} | 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------------------------------------------------------------------------------------------------------------+-------
+ SELECT $1, $2, (ROW(ARRAY[$3 /*, ... */], ARRAY[$4 /*, ... */])).*, $5, $6 | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).* FROM test_composite | 2
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2 FROM test_composite | 1
+ SELECT ((composite_f(array[$1 /*, ... */]))).*, $2, $3, $4, ((composite_f(array[$5 /*, ... */]))).*, $6, $7+| 1
+ FROM test_composite +|
+ WHERE x IN ($8 /*, ... */) |
+ SELECT (ROW(ARRAY[$1 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).* | 1
+ SELECT (ROW(ARRAY[$1 /*, ... */], ARRAY[$2 /*, ... */])).*, $3 | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(8 rows)
+
--
-- cleanup
--
@@ -818,3 +896,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f2187167c5c..06a1de97c61 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2954,9 +2954,8 @@ generate_normalized_query(JumbleState *jstate, const char *query,
* have originated from within the authoritative parser, this should not be
* a problem.
*
- * Duplicate constant pointers are possible, and will have their lengths
- * marked as '-1', so that they are later ignored. (Actually, we assume the
- * lengths were initialized as -1 to start with, and don't change them here.)
+ * Multiple constants can have the same location. We reset lengths of those
+ * past the first to -1 so that they can later be ignored.
*
* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, so we need to translate the provided locations
@@ -2976,8 +2975,6 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
core_yy_extra_type yyextra;
core_YYSTYPE yylval;
YYLTYPE yylloc;
- int last_loc = -1;
- int i;
/*
* Sort the records by location so that we can process them in order while
@@ -2998,23 +2995,29 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyextra.escape_string_warning = false;
/* Search for each constant, in sequence */
- for (i = 0; i < jstate->clocations_count; i++)
+ for (int i = 0; i < jstate->clocations_count; i++)
{
int loc = locs[i].location;
int tok;
- /* Adjust recorded location if we're dealing with partial string */
- loc -= query_loc;
-
- Assert(loc >= 0);
+ /* Ignore constants after the first one in the same location */
+ if (i > 0 && loc == locs[i - 1].location)
+ {
+ locs[i].length = -1;
+ continue;
+ }
if (locs[i].squashed)
continue; /* squashable list, ignore */
- if (loc <= last_loc)
- continue; /* Duplicate constant, ignore */
+ /* Adjust recorded location if we're dealing with partial string */
+ loc -= query_loc;
+ Assert(loc >= 0);
- /* Lex tokens until we find the desired constant */
+ /*
+ * We have a valid location for a constant that's not a dupe, let's
+ * save it. Lex tokens until we find the desired constant.
+ */
for (;;)
{
tok = core_yylex(&yylval, &yylloc, yyscanner);
@@ -3060,8 +3063,6 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
/* If we hit end-of-string, give up, leaving remaining lengths -1 */
if (tok == 0)
break;
-
- last_loc = loc;
}
scanner_finish(yyscanner);
diff --git a/contrib/pg_stat_statements/sql/squashing.sql b/contrib/pg_stat_statements/sql/squashing.sql
index 53138d125a9..03b0515f872 100644
--- a/contrib/pg_stat_statements/sql/squashing.sql
+++ b/contrib/pg_stat_statements/sql/squashing.sql
@@ -291,6 +291,30 @@ select where '1' IN ('1'::int::text, '2'::int::text);
select where '1' = ANY (array['1'::int::text, '2'::int::text]);
SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- composite function with row expansion
+create table test_composite(x integer);
+CREATE FUNCTION composite_f(a integer[], out x integer, out y integer) returns
+record as $$ begin
+ x = a[1];
+ y = a[2];
+ end;
+$$ language plpgsql;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT ((composite_f(array[1, 2]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).* FROM test_composite;
+SELECT ((composite_f(array[1, 2, 3]))).*, 1, 2, 3, ((composite_f(array[1, 2, 3]))).*, 1, 2
+FROM test_composite
+WHERE x IN (1, 2, 3);
+SELECT ((composite_f(array[1, $1, 3]))).*, 1 FROM test_composite \bind 1
+;
+-- ROW() expression with row expansion
+SELECT (ROW(ARRAY[1,2])).*;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*;
+SELECT 1, 2, (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*, 3, 4;
+SELECT (ROW(ARRAY[1, 2], ARRAY[1, $1, 3])).*, 1 \bind 1
+;
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- cleanup
--
@@ -300,3 +324,5 @@ DROP TABLE test_squash_numeric;
DROP TABLE test_squash_bigint;
DROP TABLE test_squash_cast CASCADE;
DROP TABLE test_squash_jsonb;
+DROP TABLE test_composite;
+DROP FUNCTION composite_f;
--
2.47.3
On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote:
v4 corrects some code comments.The fix in the first patch looks good, thanks.
Yeah, I think this general idea is sensible. However, I think we should
take it one step further and just remove last_loc entirely. I think
this makes the code a bit clearer. How about the attached?
getting rid of last_loc makes sense because the list is sorted makes
sense. I like this, definitely cleaner.
One minor comment is to change is to remove the "let's save it" but
in the comments, as we are no longer saving a last_loc.
/*
- * We have a valid location for a constant that's not
a dupe, let's
- * save it. Lex tokens until we find the desired constant.
+ * We have a valid location for a constant that's not
a dupe. We Lex
+ * tokens until we find the desired constant.
*/
--
Sami
On Tue, Oct 28, 2025 at 12:31:23PM +0200, Alvaro Herrera wrote:
Hmm, yeah, now that you mention this, it seems rather odd that
pgss_store() is messing with (modifying) the jstate it is given. If we
had multiple modules using a planner_hook to interact with the query
representation, what each module sees would be different depending on
which hook is called first, which sounds wrong. Maybe, as you say, we
do need to consider this a design flaw that should be fixed in a more
principled fashion ... and it's pretty hard to see that we could
backpatch any such fixes. It's a pretty old issue though (you probably
notice I hesitate to call it a bug.)
It's hard to call that a bug when nothing in core is broken directly
because of it, we are just assuming that it impacts other extensions.
Now we have always been kind of bad in terms of defining what loaded
extensions should or not should be able to do, also depending on the
order their hooks are loaded.
So I agree with you: we should fix
the specific bug first across branches, and the reworking of the
jumbling framework should be done afterwards in master only.
Agreedn.
I'm going to get the first patch pushed, after looking at it more
carefully.
Cool, thanks!
--
Michael
On 2025-Oct-28, Sami Imseih wrote:
getting rid of last_loc makes sense because the list is sorted. I like
this, definitely cleaner.One minor comment is to change is to remove the "let's save it" but
in the comments, as we are no longer saving a last_loc.
Ah, yeah, that's a good change. I have pushed it now.
Thanks!
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Entristecido, Wutra (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"
One minor comment is to change is to remove the "let's save it" but
in the comments, as we are no longer saving a last_loc.Ah, yeah, that's a good change. I have pushed it now.
Thanks for pushing!
I will take 0002 on in a separate thread as there seems to
be agreement on doing something about it.
--
Sami Imseih
Amazon Web Services (AWS)