Limit length of queryies in pg_stat_statement extension
Hi all,
Recently, I have noticed a potential problem in the pg_stat_statements extension.
When the garbage collection is triggered within this extension, if there is a significant
amount of data to be written to the pgss_query_texts.stat file (for example, when
tracking very long SQL queries), the corresponding backend process holds the pgss->lock
for an extended period. This causes all other backend processes to wait for this
lock in the extension's hook during all SQL executions, and these processes do not
respond to interrupts during this time.
To temporarily address this issue, I have written a patch that introduces a parameter
to control the maximum length of tracked SQL queries.
It seems like this is not an ideal solution, at least, I think it is necessary to
log messages at the log level before and after the garbage collection process.
This would help us diagnose similar issues in the future.I spent a considerable
amount of time investigating this issue due to the lack of relevant logs.
I believe adding these logs would be beneficial for troubleshooting.
Thanks for your attention to this issue.
Best regards,
Zhao
Attachments:
0001-limit-length-of-queries.patchapplication/octet-streamDownload
From cd4c0bf03310c77d1069fe05d185bb74fa5c0295 Mon Sep 17 00:00:00 2001
From: "zhaotinghai.zth" <zhaotinghai.zth@alibaba-inc.com>
Date: Thu, 16 Jan 2025 10:15:53 +0800
Subject: [PATCH] limit length of queries
---
contrib/pg_stat_statements/Makefile | 2 +-
.../expected/max_length.out | 57 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 19 +++++++
contrib/pg_stat_statements/sql/max_length.sql | 29 ++++++++++
4 files changed, 106 insertions(+), 1 deletion(-)
create mode 100644 contrib/pg_stat_statements/expected/max_length.out
create mode 100644 contrib/pg_stat_statements/sql/max_length.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 31e4fdeeb9..a40267a9ac 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -18,7 +18,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
REGRESS = select dml cursors utility level_tracking planning \
- user_activity wal extended cleanup oldextversions
+ user_activity wal extended cleanup oldextversions max_length
# Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
# which typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/max_length.out b/contrib/pg_stat_statements/expected/max_length.out
new file mode 100644
index 0000000000..d1007ee004
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/max_length.out
@@ -0,0 +1,57 @@
+CREATE EXTENSION pg_stat_statements;
+CREATE TABLE pg_stat_statement_tmp (query text);
+INSERT INTO pg_stat_statement_tmp
+(SELECT substring(query, 1, 64) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%');
+SET pg_stat_statements.max_query_length=64;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- check correctness
+SELECT (
+ SELECT (
+ SELECT GROUPING(a,b) FROM (VALUES (1)) v2(c)
+ ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+ grouping
+----------
+ 0
+ 0
+ 0
+(3 rows)
+
+SELECT (
+ SELECT (
+ SELECT GROUPING(e,f) FROM (VALUES (1)) v2(c)
+ ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+ grouping
+----------
+ 3
+ 0
+ 1
+(3 rows)
+
+SELECT COUNT(a.query) FROM pg_stat_statement_tmp a
+INNER JOIN pg_stat_statements b on a.query=b.query;
+ count
+-------
+ 0
+(1 row)
+
+SELECT a.query FROM pg_stat_statement_tmp a
+INNER JOIN pg_stat_statements b on a.query=b.query;
+ query
+-------
+(0 rows)
+
+-- check length
+SELECT EXISTS (SELECT * FROM pg_stat_statements WHERE length(query) > 64);
+ exists
+--------
+ f
+(1 row)
+
+DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 55b957d251..bbb6cf7b87 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -289,6 +289,7 @@ static bool pgss_track_utility = true; /* whether to track utility commands */
static bool pgss_track_planning = false; /* whether to track planning
* duration */
static bool pgss_save = true; /* whether to save stats across shutdown */
+static int pgss_max_query_length = 0;
#define pgss_enabled(level) \
@@ -450,6 +451,19 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomIntVariable("pg_stat_statements.max_query_length",
+ "Max length of query which store in pgss_query_texts.stat",
+ NULL,
+ &pgss_max_query_length,
+ 0,
+ 0,
+ PG_INT32_MAX,
+ PGC_USERSET,
+ GUC_UNIT_BYTE,
+ NULL,
+ NULL,
+ NULL);
+
MarkGUCPrefixReserved("pg_stat_statements");
/*
@@ -1294,6 +1308,11 @@ pgss_store(const char *query, uint64 queryId,
LWLockAcquire(pgss->lock, LW_SHARED);
}
+ if (pgss_max_query_length > 0)
+ {
+ query_len = Min(query_len, pgss_max_query_length);
+ }
+
/* Append new query text to file with only shared lock held */
stored = qtext_store(norm_query ? norm_query : query, query_len,
&query_offset, &gc_count);
diff --git a/contrib/pg_stat_statements/sql/max_length.sql b/contrib/pg_stat_statements/sql/max_length.sql
new file mode 100644
index 0000000000..24dff60dde
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/max_length.sql
@@ -0,0 +1,29 @@
+CREATE EXTENSION pg_stat_statements;
+CREATE TABLE pg_stat_statement_tmp (query text);
+INSERT INTO pg_stat_statement_tmp
+(SELECT substring(query, 1, 64) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%');
+
+SET pg_stat_statements.max_query_length=64;
+SELECT pg_stat_statements_reset();
+
+-- check correctness
+SELECT (
+ SELECT (
+ SELECT GROUPING(a,b) FROM (VALUES (1)) v2(c)
+ ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+SELECT (
+ SELECT (
+ SELECT GROUPING(e,f) FROM (VALUES (1)) v2(c)
+ ) FROM (VALUES (1,2)) v1(a,b) GROUP BY (a,b)
+) FROM (VALUES(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
+
+SELECT COUNT(a.query) FROM pg_stat_statement_tmp a
+INNER JOIN pg_stat_statements b on a.query=b.query;
+
+SELECT a.query FROM pg_stat_statement_tmp a
+INNER JOIN pg_stat_statements b on a.query=b.query;
+
+-- check length
+SELECT EXISTS (SELECT * FROM pg_stat_statements WHERE length(query) > 64);
+DROP EXTENSION pg_stat_statements;
\ No newline at end of file
--
2.39.3
Hi,
On Thu, Jan 16, 2025 at 10:19:49AM +0800, 赵庭海(庭章) wrote:
Hi all,
Recently, I have noticed a potential problem in the pg_stat_statements
extension. When the garbage collection is triggered within this extension,
if there is a significant amount of data to be written to the
pgss_query_texts.stat file (for example, when tracking very long SQL
queries), the corresponding backend process holds the pgss->lock for an
extended period. This causes all other backend processes to wait for this
lock in the extension's hook during all SQL executions, and these processes
do not respond to interrupts during this time.
To temporarily address this issue, I have written a patch that introduces a
parameter to control the maximum length of tracked SQL queries.
I don't think that it would move the needle much. Deallocating entries is very
expensive, even when the query text file isn't being cleaned up, as it needs to
sort all entries by usage to remove the least recently used all with an
exclusive pgss->lock. The real solution is probably to rely on the new
pluggable statistic architecture rather than using the hash table / query text
file.
It seems like this is not an ideal solution, at least, I think it is
necessary to log messages at the log level before and after the garbage
collection process. This would help us diagnose similar issues in the
future.I spent a considerable amount of time investigating this issue due to
the lack of relevant logs.
I believe adding these logs would be beneficial for troubleshooting.
Thanks for your attention to this issue.
Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
issue?
On Fri, Jan 17, 2025 at 03:24:34PM +0800, Julien Rouhaud wrote:
The real solution is probably to rely on the new
pluggable statistic architecture rather than using the hash table / query text
file.
My own plan was to propose exactly that in the v19 cycle, leaving one
release with this infrastructure in to address any comment related to
the basics introduced by it in the beta cycle.
--
Michael
Hi,
Thanks for your answer.
I don't think that it would move the needle much. Deallocating entries is very
expensive, even when the query text file isn't being cleaned up, as it needs to
sort all entries by usage to remove the least recently used all with an
exclusive pgss->lock. The real solution is probably to rely on the new
pluggable statistic architecture rather than using the hash table / query text
file.
I'm sorry I left out some details earlier. I found that the garbage collect backend process was in the loop of gc_qtexts while for a long time. The main backtrace is below.
```
#0 0x00007fc528d6aba0 in __write_nocancel () from /lib64/libc.so.6
#1 0x00007fc528cf52f3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00007fc528cf5b90 in __GI__IO_file_xsputn () from /lib64/libc.so.6
#3 0x00007fc528cea7e2 in fwrite () from /lib64/libc.so.6
#4 0x00007fc529199dd5 in gc_qtexts () at pg_stat_statements.c:2380
#5 pgss_store
#6 0x00007fc52919a2b8 in pgss_post_parse_analyze (query=0x1e9aed8, pstate=0x178a220) at pg_stat_statements.c:900
```
So I think the main reason for this long lock holding is that the I/O operation takes a long time because of these very long queries.
In my production environment. pg_stat_statement.max is set to 1000. I found that when this problem occurred, gc took more than 20 seconds. If I limit the length of a single sql to 8k, it will only take 1.79 seconds.
Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
issue?
Only in my opinions, pg_stat_statements_info.dealloc doesn't reflect how long it takes for garbage collect. Earlier when I was checking the logs for abnormal periods, there is only some slow parse logging like below.
duration: 20834 ms parse S0_1: …...
Best regards,
Tinghai Zhao
On Sun, Jan 19, 2025 at 03:32:19AM +0800, 赵庭海(庭章) wrote:
I'm sorry I left out some details earlier. I found that the garbage collect
backend process was in the loop of gc_qtexts while for a long time. The main
backtrace is below.```
#0 0x00007fc528d6aba0 in __write_nocancel () from /lib64/libc.so.6
#1 0x00007fc528cf52f3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00007fc528cf5b90 in __GI__IO_file_xsputn () from /lib64/libc.so.6
#3 0x00007fc528cea7e2 in fwrite () from /lib64/libc.so.6
#4 0x00007fc529199dd5 in gc_qtexts () at pg_stat_statements.c:2380
#5 pgss_store
#6 0x00007fc52919a2b8 in pgss_post_parse_analyze (query=0x1e9aed8, pstate=0x178a220) at pg_stat_statements.c:900
```So I think the main reason for this long lock holding is that the I/O
operation takes a long time because of these very long queries.In my production environment. pg_stat_statement.max is set to 1000. I found
that when this problem occurred, gc took more than 20 seconds. If I limit the
length of a single sql to 8k, it will only take 1.79 seconds.
As I mentioned earlier entry eviction that doesn't lead to query text gc is
also expensive, and is more frequent. Truncating the query text makes
pg_stat_statements almost useless for multiple usages (e.g. if you need to run
the query again for testing a hypothetical index or similar).
In your case almost 2s of total freeze still sounds like something that
wouldn't acceptable. Either you have some OLTP system and that's many order of
magnitude more than query, or it's some kind of OLAP and then the difference
between 2s and 10s is a very narrow window for it to be a game changer.
Why exactly do you have pg_stat_statements.max set to a value that low? Have
you tried to tune it and/or estimate how many different entries you would need
to store to avoid too frequent eviction? In general if your workload leads to
something that will never fit in a finite number of entries (use of temporary
tables is a perfect example), the overhead is so high, gc or not, that your
only option is to get rid of pg_stat_statements.
Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
issue?Only in my opinions, pg_stat_statements_info.dealloc doesn't reflect how long
it takes for garbage collect. Earlier when I was checking the logs for
abnormal periods, there is only some slow parse logging like below.
Having this counter continuously increasing tells you that you have will almost
certainly have lock contention. You should also see pg_stat_statements in the
wait events. And again the gc just makes things worse, eviction itself will
kill your performance almost as much.