pg_stat_statements: Add gc_count and query_file_size to pgss_info

Started by Lukas Fittl3 months ago3 messageshackers

lukas@fittl.com

3 months ago

Hi,

Over the last weeks we've been fighting again with pg_stat_statements
issues, specifically with the issue of having too many unique entries,
and the corresponding problem of large query text files and LWLock
wait events related to pg_stat_statements.

I think we can improve debugging for such situations by adding two
more columns to pg_stat_statements_info that expose information
already tracked:

1) "gc_count", showing when the pg_stat_statements query garbage
collection cycles occur (which can correlate with
LWLock:pg_stat_statements)

2) "query_file_size" which tells us the extent of the query text file,
so we can fine-tune when we query the texts from pg_stat_statements in
monitoring scripts (i.e. query it less frequently if the query text
file is very large).

I've had a patch to improve this prepared for a previous cycle, but
wasn't sure it was still needed because of the discussion re: keeping
query texts in shared memory. But since it looks like that won't
change for 19 (though I'm hoping to contribute more to improving that
in the PG 20 cycle), see attached for consideration.

Thanks,
Lukas

--
Lukas Fittl

Sami Imseih

samimseih@gmail.com

3 months ago

In reply to: Lukas Fittl (#1)

Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info

Hi,

Over the last weeks we've been fighting again with pg_stat_statements
issues, specifically with the issue of having too many unique entries,
and the corresponding problem of large query text files and LWLock
wait events related to pg_stat_statements.

Yeah, LWLock:pg_stat_statements related to garbage collection can be
really bad when large files must be recreated with live query texts,
or if this GC must occur often (in the case of high query entry churn).
A high dealloc count is already a good indicator of the need for GC,
but most people, in my experience, do not realize there is a GC
process, since it is not documented or exposed.

A high dealloc is already a good indicator of the need for gc, but most
people, in my experience, don't realize there is a gc process, since it's
not documented and not exposed.

I think we can improve debugging for such situations by adding two
more columns to pg_stat_statements_info that expose information
already tracked:

1) "gc_count", showing when the pg_stat_statements query garbage
collection cycles occur (which can correlate with
LWLock:pg_stat_statements)

2) "query_file_size" which tells us the extent of the query text file,
so we can fine-tune when we query the texts from pg_stat_statements in
monitoring scripts (i.e. query it less frequently if the query text
file is very large).

I do agree that having such additional information, with proper
documentation, is a good idea. However, I do wonder if we should hold
off on adding any of this info in 19 because of the point you make
below, which could completely change the information we need to
expose. Adding this information in 19 and then removing it for 20 may
not be worthwhile.

I've had a patch to improve this prepared for a previous cycle, but
wasn't sure it was still needed because of the discussion re: keeping
query texts in shared memory. But since it looks like that won't
change for 19 (though I'm hoping to contribute more to improving that
in the PG 20 cycle), see attached for consideration.

19 has 4ba012a8ed, which allows us to serialize and deserialize query
texts stored in, for example, DSA, with a dsa_pointer tracked by the
entry of a custom stats kind. I was also planning on continuing this
work for 20, and getting 4ba012a8ed was an important prerequisite for
this.

What do you think?

[1]: [/messages/by-id/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com%5D

--
Sami Imseih
Amazon Web Services (AWS)

Lukas Fittl

lukas@fittl.com

2 months ago

In reply to: Sami Imseih (#2)

Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info

Hi Sami,

On Sun, Mar 1, 2026 at 7:44 AM Sami Imseih <samimseih@gmail.com> wrote:

I do agree that having such additional information, with proper
documentation, is a good idea. However, I do wonder if we should hold
off on adding any of this info in 19 because of the point you make
below, which could completely change the information we need to
expose. Adding this information in 19 and then removing it for 20 may
not be worthwhile.

Yeah, I suspect you're right - I'll mark this as returned with feedback for now.

I've had a patch to improve this prepared for a previous cycle, but
wasn't sure it was still needed because of the discussion re: keeping
query texts in shared memory. But since it looks like that won't
change for 19 (though I'm hoping to contribute more to improving that
in the PG 20 cycle), see attached for consideration.

19 has 4ba012a8ed, which allows us to serialize and deserialize query
texts stored in, for example, DSA, with a dsa_pointer tracked by the
entry of a custom stats kind. I was also planning on continuing this
work for 20, and getting 4ba012a8ed was an important prerequisite for
this.

Yup, makes sense - I think 4ba012a8ed is a foundational piece to make
progress on this in 20.

I've also been wondering if we should prototype a new
pg_stat_statements module separately (e.g. in a GitHub repo,
"pg_stat_statements_next" or something), just to allow quicker
iteration and allow easier testing of a pure in-memory approach,
before bringing it to the discussion table for -hackers later in the
PG20 cycle.

Thanks,
Lukas

--
Lukas Fittl

pg_stat_statements: Add gc_count and query_file_size to pgss_info

Attachments: