Fix pg_stat_statements display of normalized FETCH counts

Started by Chao Li3 months ago3 messageshackers

li.evan.chao@gmail.com

3 months ago

Hi,

While testing the new feature "Show sizes of FETCH queries as constants in pg_stat_statements”, I found a problem where query string shown depends on order of FETCH statements.

This is a simple repro:

Setup:
```
select pg_stat_statements_reset();
begin;
declare c cursor for select g from generate_series(1, 10) g;
```

If FETCH without a count is executed first:
```
evantest=*# fetch c;
g
---
1
(1 row)

evantest=*# fetch 2 c;
g
---
2
3
(2 rows)

evantest=*# commit;
COMMIT
evantest=#
evantest=# select calls, query from pg_stat_statements where query like ‘fetch%c%';
calls | query
-------+-----------
2 | fetch c
(1 row)
```

The query text is shown as the unnormalized "fetch c”.

But if FETCH with a count is executed first:
```
evantest=*# fetch 2 c;
g
---
1
2
(2 rows)

evantest=*# fetch c;
g
---
3
(1 row)

evantest=*# commit;
COMMIT
evantest=#
evantest=# select calls, query from pg_stat_statements where query like 'fetch%c%';
calls | query
-------+------------
2 | fetch $1 c
(1 row)
```

Then the query text is shown as the normalized “fetch $1 c”. This seems incorrect to me, because the representative query text should not depend on the execution order of FETCH statements.

The attached patch tries to fix this by adding a query_normalized flag to pgssEntry, which records whether the stored representative query text is already normalized. With this flag, if FETCH c is executed first and stores an unnormalized query string, a later FETCH 2 c can replace it with the normalized query string.

One part of the implementation that I am not fully satisfied with is that I added a new parameter to pgss_store() to opt-in to the replacement logic only for FETCH statements. Without that restriction, SET SESSION AUTHORIZATION is broken: SET SESSION AUTHORIZATION 'r1' and SET SESSION AUTHORIZATION 'r2' are not combined into one pg_stat_statements entry, so blindly applying this replacement logic more broadly would be wrong.

I am not sure whether it is better to opt-in only for FETCH, or to apply the logic more broadly and explicitly opt-out cases such as SET SESSION AUTHORIZATION. Comments and suggestions are welcome.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Michael Paquier

michael@paquier.xyz

2 months ago

In reply to: Chao Li (#1)

Re: Fix pg_stat_statements display of normalized FETCH counts

On Mon, May 11, 2026 at 02:13:27PM +0800, Chao Li wrote:

Then the query text is shown as the normalized “fetch $1 c”. This
seems incorrect to me, because the representative query text should
not depend on the execution order of FETCH statements.

This is an incorrect expectation. These query patterns are grouped
together because they are represented the same way at their Node level
after deparsing, and PGSS uses the first query string it finds when
inserting the data of a query into a new slot. Specifying only a
cursor name means a forward fetch with FETCH_KEYWORD_NONE, for one
tuple. Specifying an integer (with or without from_in) and a cursor
name means a forward fetch with FETCH_KEYWORD_NONE, for a specified
number of tuples. The whole point is to normalize around the number
of tuples all these queries. Both queries mean the same thing, once
FetchStmt.howMany is moved into the "ignore" area of query jumbling.

The attached patch tries to fix this by adding a query_normalized
flag to pgssEntry, which records whether the stored representative
query text is already normalized. With this flag, if FETCH c is
executed first and stores an unnormalized query string, a later
FETCH 2 c can replace it with the normalized query string.

Nope, I don't think that this is something we need to act on. Note
that adding an extra generate_normalized_query() is not an acceptable
thing to do: this has a performance impact and we don't want to make
PGSS heavier than it is today. So it is inefficient, for one.

The correct thing to do if we'd want to make the difference between
the two cases would be to add a new value to FetchDirectionKeywords,
and assign that to the "cursor_name" and "from_in cursor_name" case
(say a new FETCH_KEYWORD_SINGLE?) in gram.y. I don't think that we
need to do something here as this does not really represent a gain in
terms of monitoring (aka more normalization is better to me here), but
the new value would be the correct thing to do if it happens that
folks want this difference to show up.
--
Michael

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: Michael Paquier (#2)

Re: Fix pg_stat_statements display of normalized FETCH counts

On May 12, 2026, at 13:42, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, May 11, 2026 at 02:13:27PM +0800, Chao Li wrote:

Then the query text is shown as the normalized “fetch $1 c”. This
seems incorrect to me, because the representative query text should
not depend on the execution order of FETCH statements.

This is an incorrect expectation. These query patterns are grouped
together because they are represented the same way at their Node level
after deparsing, and PGSS uses the first query string it finds when
inserting the data of a query into a new slot. Specifying only a
cursor name means a forward fetch with FETCH_KEYWORD_NONE, for one
tuple. Specifying an integer (with or without from_in) and a cursor
name means a forward fetch with FETCH_KEYWORD_NONE, for a specified
number of tuples. The whole point is to normalize around the number
of tuples all these queries. Both queries mean the same thing, once
FetchStmt.howMany is moved into the "ignore" area of query jumbling.

The attached patch tries to fix this by adding a query_normalized
flag to pgssEntry, which records whether the stored representative
query text is already normalized. With this flag, if FETCH c is
executed first and stores an unnormalized query string, a later
FETCH 2 c can replace it with the normalized query string.

Nope, I don't think that this is something we need to act on. Note
that adding an extra generate_normalized_query() is not an acceptable
thing to do: this has a performance impact and we don't want to make
PGSS heavier than it is today. So it is inefficient, for one.

The correct thing to do if we'd want to make the difference between
the two cases would be to add a new value to FetchDirectionKeywords,
and assign that to the "cursor_name" and "from_in cursor_name" case
(say a new FETCH_KEYWORD_SINGLE?) in gram.y. I don't think that we
need to do something here as this does not really represent a gain in
terms of monitoring (aka more normalization is better to me here), but
the new value would be the correct thing to do if it happens that
folks want this difference to show up.
--
Michael

Actually, I’m still studying how to improve this patch, so your input is very timely.

If the “first query wins” behavior for the representative query text is intentional here, then I’m fine with withdrawing this patch. In any case, debugging this taught me a lot about how pg_stat_statements works.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Fix pg_stat_statements display of normalized FETCH counts

Attachments: