Cleaning up PREPARE query strings?
Hi,
Currently prepared statements store the whole query string that was submitted
by the client at the time of the PREPARE as-is. This is usually fine, but if
that query was a multi-statement query string it can lead to a waste of memory.
There are some pattern that are more likely to have such overhead, mine being
an application with a fixed set of prepared statements that are sent at the
connection start using a single query to avoid extra round trips.
One naive example of the outcome is as follow:
#= PREPARE s1 AS SELECT 1\; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE
relname = $1\; PREPARE s3(int, int) AS SELECT $1 + $2;
PREPARE
PREPARE
PREPARE
=# SELECT name, statement FROM pg_prepared_statements ;
name | statement
------+----------------------------------------------------------------------------
s1 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
s2 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
s3 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
(3 rows)
The more prepared statements you have the bigger the waste. This is also not
particularly readable for people who want to rely on the pg_prepared_statements
views, as you need to parse the query again yourself to figure out what exactly
is the associated query.
I assume that some other patterns could lead to other kind of problems. For
instance if the query string includes a prepared statement and some DML, it
could lead some automated program to replay both the PREPARE and DML when only
the PREPARE was intended.
I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
passed query text the same way as pg_stat_statements. Since it relies on the
location saved during parsing the overhead should be minimal, and only present
when some space can actually be saved. Note that I first tried to have the
cleanup done in CreateCachedPlan so that it's done everywhere including things
like the extended protocol but this lead to too many issues so I ended up doing
it for an explicit PREPARE statement only.
With this patch applied, the above scenario gives this new output:
=# SELECT name, statement FROM pg_prepared_statements ;
name | statement
------+----------------------------------------------------
s1 | PREPARE s1 AS SELECT 1
s2 | PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1
s3 | PREPARE s3(int, int) AS SELECT $1 + $2
(3 rows)
One possible issue is that any comment present at the beginning of the query
text would be discarded. I'm not sure if that's something used by e.g.
pg_hint_plan, but if yes it's always possible to put the statement in front of
the SELECT (or other actual first keyword) rather than the PREPARE itself to
preserve it.
Attachments:
v1-0001-Cleanup-explicit-PREPARE-query-strings.patchtext/plain; charset=us-asciiDownload
From bab4d858e18cdd50713ae430e28499ca2acab441 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 24 Dec 2025 22:31:52 +0800
Subject: [PATCH v1] Cleanup explicit PREPARE query strings
When a multi statements query string contains one PREPARE statement (or
multiple), the whole query string was saved in the cached plan. This is
wasteful as that string can be artitrarily big, but it can also confusing as
some other parts like pg_prepared_statements will output the saved query string
as-is.
This commit changes this behavior and only stores the part of the query string
that correspond to any given PREPARED statement, similarly to how it's already
done in pg_stat_statements.
---
contrib/auto_explain/t/001_auto_explain.pl | 8 ++--
src/backend/commands/prepare.c | 43 +++++++++++++++++++--
src/test/regress/expected/prepare.out | 44 +++++++++++++---------
src/test/regress/sql/prepare.sql | 2 +-
4 files changed, 72 insertions(+), 25 deletions(-)
diff --git a/contrib/auto_explain/t/001_auto_explain.pl b/contrib/auto_explain/t/001_auto_explain.pl
index 6af5ac1da18..535c4770095 100644
--- a/contrib/auto_explain/t/001_auto_explain.pl
+++ b/contrib/auto_explain/t/001_auto_explain.pl
@@ -60,7 +60,7 @@ $log_contents = query_log($node,
like(
$log_contents,
- qr/Query Text: PREPARE get_proc\(name\) AS SELECT \* FROM pg_proc WHERE proname = \$1;/,
+ qr/Query Text: PREPARE get_proc\(name\) AS SELECT \* FROM pg_proc WHERE proname = \$1/,
"prepared query text logged, text mode");
like(
@@ -82,7 +82,7 @@ $log_contents = query_log(
like(
$log_contents,
- qr/Query Text: PREPARE get_type\(name\) AS SELECT \* FROM pg_type WHERE typname = \$1;/,
+ qr/Query Text: PREPARE get_type\(name\) AS SELECT \* FROM pg_type WHERE typname = \$1/,
"prepared query text logged, text mode");
like(
@@ -98,7 +98,7 @@ $log_contents = query_log(
like(
$log_contents,
- qr/Query Text: PREPARE get_type\(name\) AS SELECT \* FROM pg_type WHERE typname = \$1;/,
+ qr/Query Text: PREPARE get_type\(name\) AS SELECT \* FROM pg_type WHERE typname = \$1/,
"prepared query text logged, text mode");
unlike(
@@ -164,7 +164,7 @@ $log_contents = query_log(
like(
$log_contents,
- qr/"Query Text": "PREPARE get_class\(name\) AS SELECT \* FROM pg_class WHERE relname = \$1;"/,
+ qr/"Query Text": "PREPARE get_class\(name\) AS SELECT \* FROM pg_class WHERE relname = \$1"/,
"prepared query text logged, json mode");
like(
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..11b330ab73d 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -27,6 +27,7 @@
#include "commands/prepare.h"
#include "funcapi.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/queryjumble.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
@@ -64,6 +65,7 @@ PrepareQuery(ParseState *pstate, PrepareStmt *stmt,
Oid *argtypes = NULL;
int nargs;
List *query_list;
+ const char *new_query;
/*
* Disallow empty-string statement name (conflicts with protocol-level
@@ -80,14 +82,49 @@ PrepareQuery(ParseState *pstate, PrepareStmt *stmt,
*/
rawstmt = makeNode(RawStmt);
rawstmt->stmt = stmt->query;
- rawstmt->stmt_location = stmt_location;
- rawstmt->stmt_len = stmt_len;
+
+ /*
+ * Extract the query text if possible.
+ *
+ * If we have a statement location, we can extract the relevant part of the
+ * possibly multi-statement query string. If not just use what we were
+ * given.
+ */
+ if (stmt_location < 0)
+ {
+ rawstmt->stmt_location = stmt_location;
+ rawstmt->stmt_len = stmt_len;
+ new_query = pstate->p_sourcetext;
+ }
+ else
+ {
+ const char *cleaned;
+ char *tmp;
+
+ rawstmt->stmt_len = stmt_len;
+ cleaned = CleanQuerytext(pstate->p_sourcetext, &stmt_location,
+ &rawstmt->stmt_len);
+
+ if (rawstmt->stmt_len == 0)
+ rawstmt->stmt_len = strlen(cleaned);
+
+ /*
+ * CleanQuerytext() removes any leading whitespace and returns a
+ * pointer to the first actual character, so the cleaned query string
+ * is guaranteed to start at offset 0.
+ */
+ rawstmt->stmt_location = 0;
+ tmp = palloc(rawstmt->stmt_len + 1);
+ strlcpy(tmp, cleaned, rawstmt->stmt_len + 1);
+
+ new_query = tmp;
+ }
/*
* Create the CachedPlanSource before we do parse analysis, since it needs
* to see the unmodified raw parse tree.
*/
- plansource = CreateCachedPlan(rawstmt, pstate->p_sourcetext,
+ plansource = CreateCachedPlan(rawstmt, new_query,
CreateCommandTag(stmt->query));
/* Transform list of TypeNames to array of type OIDs */
diff --git a/src/test/regress/expected/prepare.out b/src/test/regress/expected/prepare.out
index 5815e17b39c..c645a4e5d0e 100644
--- a/src/test/regress/expected/prepare.out
+++ b/src/test/regress/expected/prepare.out
@@ -6,7 +6,17 @@ SELECT name, statement, parameter_types, result_types FROM pg_prepared_statement
------+-----------+-----------------+--------------
(0 rows)
-PREPARE q1 AS SELECT 1 AS a;
+SELECT 'bingo'\; PREPARE q1 AS SELECT 1 AS a \; SELECT 42;
+ ?column?
+----------
+ bingo
+(1 row)
+
+ ?column?
+----------
+ 42
+(1 row)
+
EXECUTE q1;
a
---
@@ -14,9 +24,9 @@ EXECUTE q1;
(1 row)
SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
- name | statement | parameter_types | result_types
-------+------------------------------+-----------------+--------------
- q1 | PREPARE q1 AS SELECT 1 AS a; | {} | {integer}
+ name | statement | parameter_types | result_types
+------+-----------------------------+-----------------+--------------
+ q1 | PREPARE q1 AS SELECT 1 AS a | {} | {integer}
(1 row)
-- should fail
@@ -33,18 +43,18 @@ EXECUTE q1;
PREPARE q2 AS SELECT 2 AS b;
SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
- name | statement | parameter_types | result_types
-------+------------------------------+-----------------+--------------
- q1 | PREPARE q1 AS SELECT 2; | {} | {integer}
- q2 | PREPARE q2 AS SELECT 2 AS b; | {} | {integer}
+ name | statement | parameter_types | result_types
+------+-----------------------------+-----------------+--------------
+ q1 | PREPARE q1 AS SELECT 2 | {} | {integer}
+ q2 | PREPARE q2 AS SELECT 2 AS b | {} | {integer}
(2 rows)
-- sql92 syntax
DEALLOCATE PREPARE q1;
SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
- name | statement | parameter_types | result_types
-------+------------------------------+-----------------+--------------
- q2 | PREPARE q2 AS SELECT 2 AS b; | {} | {integer}
+ name | statement | parameter_types | result_types
+------+-----------------------------+-----------------+--------------
+ q2 | PREPARE q2 AS SELECT 2 AS b | {} | {integer}
(1 row)
DEALLOCATE PREPARE q2;
@@ -168,20 +178,20 @@ SELECT name, statement, parameter_types, result_types FROM pg_prepared_statement
------+------------------------------------------------------------------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------
q2 | PREPARE q2(text) AS +| {text} | {name,boolean,boolean}
| SELECT datname, datistemplate, datallowconn +| |
- | FROM pg_database WHERE datname = $1; | |
+ | FROM pg_database WHERE datname = $1 | |
q3 | PREPARE q3(text, int, float, boolean, smallint) AS +| {text,integer,"double precision",boolean,smallint} | {integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,name,name,name}
| SELECT * FROM tenk1 WHERE string4 = $1 AND (four = $2 OR+| |
| ten = $3::bigint OR true = $4 OR odd = $5::int) +| |
- | ORDER BY unique1; | |
+ | ORDER BY unique1 | |
q5 | PREPARE q5(int, text) AS +| {integer,text} | {integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,name,name,name}
| SELECT * FROM tenk1 WHERE unique1 = $1 OR stringu1 = $2 +| |
- | ORDER BY unique1; | |
+ | ORDER BY unique1 | |
q6 | PREPARE q6 AS +| {integer,name} | {integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,integer,name,name,name}
- | SELECT * FROM tenk1 WHERE unique1 = $1 AND stringu1 = $2; | |
+ | SELECT * FROM tenk1 WHERE unique1 = $1 AND stringu1 = $2 | |
q7 | PREPARE q7(unknown) AS +| {path} | {text,path}
- | SELECT * FROM road WHERE thepath = $1; | |
+ | SELECT * FROM road WHERE thepath = $1 | |
q8 | PREPARE q8 AS +| {integer,name} |
- | UPDATE tenk1 SET stringu1 = $2 WHERE unique1 = $1; | |
+ | UPDATE tenk1 SET stringu1 = $2 WHERE unique1 = $1 | |
(6 rows)
-- test DEALLOCATE ALL;
diff --git a/src/test/regress/sql/prepare.sql b/src/test/regress/sql/prepare.sql
index c6098dc95ce..0e7fe44725e 100644
--- a/src/test/regress/sql/prepare.sql
+++ b/src/test/regress/sql/prepare.sql
@@ -4,7 +4,7 @@
SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
-PREPARE q1 AS SELECT 1 AS a;
+SELECT 'bingo'\; PREPARE q1 AS SELECT 1 AS a \; SELECT 42;
EXECUTE q1;
SELECT name, statement, parameter_types, result_types FROM pg_prepared_statements;
--
2.52.0
Julien Rouhaud <rjuju123@gmail.com> writes:
I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
passed query text the same way as pg_stat_statements.
This patch invalidates all the location fields in the parsed query:
they could not be used to generate sane error cursors referencing the
truncated string. I'm not sure how many places try to generate such
errors post-parsing, but it's more than zero, and I've long had
ambitions of trying to extend that substantially (e.g, allowing
execution-time errors from functions to point at the relevant function
call).
Certainly the patch could be extended to update all those fields,
but that increases its complexity very significantly. I doubt
that it's worth it. My reaction to your example is more like
"if that bothers you, don't do it that way".
regards, tom lane
Hi,
I think this is a good idea.
Julien Rouhaud <rjuju123@gmail.com> writes:
I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
passed query text the same way as pg_stat_statements.This patch invalidates all the location fields in the parsed query:
they could not be used to generate sane error cursors referencing the
truncated string.
I am not sure why we need to update the location of the rawstmt at all.
CreateCachedPlan does not seem to care about location and length
of query strings, AFAICT.
```rawstmt->stmt_location = 0;```
This is somewhat tangential, but I am now also wondering if
CleanQuerytext itself
should not mess with supplied statement location and length, but instead
send back the location and length of the isolated query text within the
multi-line query separately.
From
```
extern const char *CleanQuerytext(const char *query, int *location, int *len);
```
To
```
extern const char *CleanQuerytext(const char *query, int location, int
len, int *new_location, int *new_len);
```
It looks wrong that CleanQueryText is modifying any of the original
locations and lengths.
--
Sami Imseih
Amazon Web Services (AWS)
Hi,
On Wed, Dec 24, 2025 at 11:21:00AM -0500, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
passed query text the same way as pg_stat_statements.This patch invalidates all the location fields in the parsed query:
they could not be used to generate sane error cursors referencing the
truncated string. I'm not sure how many places try to generate such
errors post-parsing, but it's more than zero, and I've long had
ambitions of trying to extend that substantially (e.g, allowing
execution-time errors from functions to point at the relevant function
call).
Ah I wasn't aware of that. After a quick look I'm assuming it's all the
callers of executor_errposition()?
There aren't a lot of them, and I've unfortunately been unable to hit any of
them. Every scenario I tried was always caught during planning time. Do you
have any example on how to exercise them?
Certainly the patch could be extended to update all those fields,
but that increases its complexity very significantly. I doubt
that it's worth it. My reaction to your example is more like
"if that bothers you, don't do it that way".
I agree that updating the location fields in the whole parsetree is a non
starter, butt I don't think that it's the only option.
If that's indeed the cases going through executor_errposition(), they rely on
having the executor state at hand to retrieve the query string. We could
simply have an extra "query string offset" field stored there that would always
be 0 except when PREPARE changes the input query string and then do the
location to character number calculation after applying that offset. And that
offset should already be returned by CleanQuerytext() so it wouldn't have extra
complication.