query_id, pg_stat_activity, extended query protocol
I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out that reason
can be in extended query protocol (
/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.
br
Kaido
On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:
I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out that reason
can be in extended query protocol (
/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.
Well, you could say a bit of both, I guess. The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.
FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
--
Michael
Thnx.
br
Kaido
On Tue, 13 Jun 2023 at 03:16, Michael Paquier <michael@paquier.xyz> wrote:
Show quoted text
On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:
I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out thatreason
can be in extended query protocol (
/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.Well, you could say a bit of both, I guess. The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
--
Michael
FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
This just cropped up as a pgjdbc github issue. Seems like something that
should be addressed.
Dave
FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
I was dealing with this today and found this thread. I spent some time
looking at possible solutions.
In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.
/*
* If a new query is started, we reset the query identifier as it'll only
* be known after parse analysis, to avoid reporting last query's
* identifier.
*/
if (state == STATE_RUNNING)
beentry->st_query_id = UINT64CONST(0);
So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
debug_query_string = psrc->query_string;
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);
set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;
pgstat_report_activity(STATE_RUNNING, sourceText);
+ pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
thoughts?
Regards,
Sami Imseih
Amazon Web Services (AWS)
On 23/4/2024 11:16, Imseih (AWS), Sami wrote:
FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.I was dealing with this today and found this thread. I spent some time
looking at possible solutions.In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below./*
* If a new query is started, we reset the query identifier as it'll only
* be known after parse analysis, to avoid reporting last query's
* identifier.
*/
if (state == STATE_RUNNING)
beentry->st_query_id = UINT64CONST(0);So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 76f48b13d2..7ec2df91d5 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message) debug_query_string = psrc->query_string;pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;pgstat_report_activity(STATE_RUNNING, sourceText);
+ pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
thoughts?
In exec_bind_message, how can you be sure that queryId exists in
query_list before the call of GetCachedPlan(), which will validate and
lock the plan? What if some OIDs were altered in the middle?
--
regards, Andrei Lepikhov
On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:
On 23/4/2024 11:16, Imseih (AWS), Sami wrote:
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true); set_ps_display("BIND"); @@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows) debug_query_string = sourceText; pgstat_report_activity(STATE_RUNNING, sourceText); + pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true); cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?
I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.
Did you consider using \bind to show how this behaves in a regression
test?
--
Michael
On 4/23/24 12:49, Michael Paquier wrote:
On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:
On 23/4/2024 11:16, Imseih (AWS), Sami wrote:
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true); set_ps_display("BIND"); @@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows) debug_query_string = sourceText; pgstat_report_activity(STATE_RUNNING, sourceText); + pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true); cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.Did you consider using \bind to show how this behaves in a regression
test?
I'm not sure how to invent a test based on the \bind command - we need
some pause in the middle.
But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:
CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
/*
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on public.test (actual time=0.002..0.004 rows=0 loops=1)
Query Identifier: 6750745711909650694
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on public.test (actual time=0.004..0.004 rows=0 loops=1)
Query Identifier: -2597546769858730762
*/
We have different objects which can be changed - I just have invented
the most trivial example to discuss.
--
regards, Andrei Lepikhov
I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.
IIUC, the query trees returned from QueryRewrite
will all have the same queryId, so it appears valid to
use the queryId from the first tree in the list. Right?
Here is an example I was working with that includes user-defined rules
that has a list with more than 1 tree.
postgres=# explain (verbose, generic_plan) insert into mytab values ($1) RETURNING pg_sleep($1), id ;
QUERY PLAN
-----------------------------------------------------------
Insert on public.mytab (cost=0.00..0.01 rows=1 width=4)
Output: pg_sleep(($1)::double precision), mytab.id
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
Insert on public.mytab2 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
Insert on public.mytab3 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
Insert on public.mytab4 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
(20 rows)
Did you consider using \bind to show how this behaves in a regression
test?
Yes, this is precisely how I tested. Without the patch, I could not
see a queryId after 9 seconds of a pg_sleep, but with the patch it
appears. See the test below.
## test query
select pg_sleep($1) \bind 30
## unpatched
postgres=# select
query_id,
query,
now()-query_start query_duration,
state
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id | query | query_duration | state
----------+----------------------+-----------------+--------
| select pg_sleep($1) +| 00:00:08.604845 | active
| ; | |
(1 row)
## patched
postgres=# truncate table large;^C
postgres=# select
query_id,
query,
now()-query_start query_duration,
state
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id | query | query_duration | state
---------------------+----------------------+----------------+--------
2433215470630378210 | select pg_sleep($1) +| 00:00:09.6881 | active
| ; | |
(1 row)
For exec_execute_message, I realized that to report queryId for
Utility and non-utility statements, we need to report the queryId
inside the portal routines where PlannedStmt contains the queryId.
Attached is the first real attempt at the fix.
Regards,
Sami
Attachments:
0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patchapplication/octet-stream; name=0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patchDownload
From b0804f3bc9268d7929488ce9f259154ad4874b38 Mon Sep 17 00:00:00 2001
From: Sami Imseih <samimseih@gmail.com>
Date: Wed, 24 Apr 2024 01:24:56 +0000
Subject: [PATCH 1/1] Fix Extended QUery Protocol handling of queryId
Unlike simple query protocol, extended query protocol
requires 3 routines to execute a command: parse/bind/execute.
In each one of these routines, the queryId is reset due to
pgstat_report_activity(STATE_RUNNING) being called at the start
of each routine. This fix adds pgstat_report_query_id during
the bind and execution calls, after the reset of the queryId.
---
src/backend/tcop/postgres.c | 6 ++++++
src/backend/tcop/pquery.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..3c3298d758 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1679,6 +1679,12 @@ exec_bind_message(StringInfo input_message)
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ /*
+ * Report the query_id as it was reset by pgstat_report_activity.
+ */
+ if (psrc->query_list)
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, false);
+
set_ps_display("BIND");
if (save_log_statement_stats)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 0c45fcf318..729fc44c3b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
#include "pg_trace.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
@@ -891,6 +892,9 @@ PortalRunSelect(Portal portal,
if (queryDesc)
queryDesc->dest = dest;
+ if (queryDesc)
+ pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+
/*
* Determine which direction to go in, and check to see if we're already
* at the end of the available tuples in that direction. If so, set the
@@ -1218,6 +1222,8 @@ PortalRunMulti(Portal portal,
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ pgstat_report_query_id(pstmt->queryId, false);
+
/*
* If we got a cancel signal in prior command, quit
*/
--
2.40.1
But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:
CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
Regards,
Sami
On Sat, Apr 27, 2024 at 6:55 AM Imseih (AWS), Sami <simseih@amazon.com>
wrote:
Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
We choose a arguably more user-friendly option:
https://www.postgresql.org/docs/current/sql-prepare.html
"""
Although the main point of a prepared statement is to avoid repeated parse
analysis and planning of the statement, PostgreSQL will force re-analysis
and re-planning of the statement before using it whenever database objects
used in the statement have undergone definitional (DDL) changes or their
planner statistics have been updated since the previous use of the prepared
statement.
"""
David J.
We choose a arguably more user-friendly option:
Thanks for pointing this out!
Regards,
Sami
But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:
CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
I tested v1 thoroughly.
Using the attached JDBC script for testing, I added some logging of the queryId
being reported by the patch and added a breakpoint after sync [1]https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877 which at that
point the locks are released on the table. I then proceeded to drop and recreate the table
and observed that the first bind after recreating the table still reports the
old queryId but the execute reports the correct queryId. This is because
the bind still has not had a chance to re-parse and re-plan after the
cache invalidation.
2024-04-27 13:51:15.757 CDT [43483] LOG: duration: 21322.475 ms execute S_1: select pg_sleep(10)
2024-04-27 13:51:21.591 CDT [43483] LOG: duration: 0.834 ms parse S_2: select from tab1 where id = $1
2024-04-27 13:51:21.591 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG: duration: 0.729 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:21.592 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG: duration: 0.032 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:32.501 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG: duration: 0.342 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:32.502 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG: duration: 0.067 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:42.613 CDT [43526] LOG: query_id = -4766379021163149612
-- recreate the tables
2024-04-27 13:51:42.621 CDT [43526] LOG: duration: 8.488 ms statement: drop table if exists tab1;
2024-04-27 13:51:42.621 CDT [43526] LOG: query_id = 7875284141628316369
2024-04-27 13:51:42.625 CDT [43526] LOG: duration: 3.364 ms statement: create table tab1 ( id int );
2024-04-27 13:51:42.625 CDT [43526] LOG: query_id = 2967282624086800441
2024-04-27 13:51:42.626 CDT [43526] LOG: duration: 0.936 ms statement: insert into tab1 values (1);
-- this reports the old query_id
2024-04-27 13:51:45.058 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:45.059 CDT [43483] LOG: duration: 0.913 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:45.059 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:45.059 CDT [43483] LOG: duration: 0.096 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG: duration: 0.108 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG: duration: 0.024 ms execute S_2: select from tab1 where id = $1
The easy answer is to not report queryId during the bind message, but I will look
at what else can be done here as it's good to have a queryId reported in this scenario
for cases there are long planning times and we rather not have those missed in
pg_stat_activity sampling.
[1]: https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877
Regards,
Sami
Attachments:
On 27/4/2024 20:54, Imseih (AWS), Sami wrote:
But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
I don't think so. It may be any object, even stored procedure, that can
be changed. IMO, the right option here is to report zero (like the
undefined value of queryId) until the end of the parsing stage.
--
regards, Andrei Lepikhov
Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1]/messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com in which the queryId may change
due to a cached query invalidation.
[1]: /messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com
Regards,
Sami
Attachments:
0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patchapplication/octet-stream; name=0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patchDownload
From d3b0ae12b612a049db871c80acc245055c3bde2c Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Tue, 30 Apr 2024 21:38:59 -0500
Subject: [PATCH 1/1] Fix queryId reportig in Extended Query Protocol
---
src/backend/tcop/pquery.c | 12 ++++++++++++
src/backend/utils/cache/plancache.c | 8 ++++++++
2 files changed, 20 insertions(+)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 0c45fcf318..275a5d3b8d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
#include "pg_trace.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
@@ -882,6 +883,14 @@ PortalRunSelect(Portal portal,
/* Caller messed up if we have neither a ready query nor held data. */
Assert(queryDesc || portal->holdStore);
+ /*
+ * Report the queryId which may have been reset by pgstat_report_activity
+ * in the execute message of extended query protocol. We do the same
+ * for PortalRunMulti.
+ */
+ if (queryDesc && queryDesc->plannedstmt)
+ pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+
/*
* Force the queryDesc destination to the right thing. This supports
* MOVE, for example, which will pass in dest = DestNone. This is okay to
@@ -1210,6 +1219,9 @@ PortalRunMulti(Portal portal,
if (altdest->mydest == DestRemoteExecute)
altdest = None_Receiver;
+ if (portal->stmts)
+ pgstat_report_query_id(linitial_node(PlannedStmt, portal->stmts)->queryId, false);
+
/*
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..762ff0a8ad 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -66,6 +66,7 @@
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/inval.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
@@ -1182,6 +1183,13 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
/* Make sure the querytree list is valid and we have parse-time locks */
qlist = RevalidateCachedQuery(plansource, queryEnv);
+ /*
+ * In case the query tree underwent re-analysis, we may have a new
+ * queryId to report.
+ */
+ if (qlist)
+ pgstat_report_query_id(linitial_node(Query, qlist)->queryId, false);
+
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
--
2.39.3 (Apple Git-146)
On 5/1/24 10:07, Imseih (AWS), Sami wrote:
Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.[1] /messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com
I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.
I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.
However, the process of setting up the queryId is more complex. Should
we set it at the beginning of query execution? This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.
See in the attachment some sketch for that - it needs to add queryId
reset on abortion.
--
regards, Andrei Lepikhov
Attachments:
queryid_report.difftext/x-patch; charset=UTF-8; name=queryid_report.diffDownload
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c..a4d38a157f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -470,6 +470,12 @@ ExecutorEnd(QueryDesc *queryDesc)
(*ExecutorEnd_hook) (queryDesc);
else
standard_ExecutorEnd(queryDesc);
+
+ /*
+ * Report at the end of query execution. Don't change it for nested
+ * queries.
+ */
+ pgstat_report_query_id(0, false);
}
void
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 032818423f..ba29dc5bc7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -58,6 +58,7 @@
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
#include "partitioning/partdesc.h"
+#include "pgstat.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/selfuncs.h"
@@ -280,6 +281,13 @@ planner(Query *parse, const char *query_string, int cursorOptions,
result = (*planner_hook) (parse, query_string, cursorOptions, boundParams);
else
result = standard_planner(parse, query_string, cursorOptions, boundParams);
+
+ /*
+ * Reset queryId at the end of planning job. Executor code will set it up
+ * again on the ExecutorStart call.
+ */
+ pgstat_report_query_id(0, false);
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2dff28afce..10e2529cf6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1108,8 +1108,6 @@ exec_simple_query(const char *query_string)
const char *cmdtagname;
size_t cmdtaglen;
- pgstat_report_query_id(0, true);
-
/*
* Get the command name for use in status display (it also becomes the
* default completion tag, down inside PortalRun). Set ps_status and
I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.
In exec_parse_message, exec_bind_message and exec_execute_message,
the queryId is reset via pgstat_report_activity
I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.
When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.
If we reset queryId at the end of execution, this behavior breaks. Right?
This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.
I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.
If my understanding is correct, I don't think that is a good idea. Tools that
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.
Regards,
Sami
On Wed, May 15, 2024 at 03:24:05AM +0000, Imseih (AWS), Sami wrote:
I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.If we reset queryId at the end of execution, this behavior breaks. Right?
Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly. Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.
This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.If my understanding is correct, I don't think that is a good idea. Tools that
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.
I'm -1 with the point of resetting the query ID based on what the
patch does, even if it remains available in the hooks.
pg_stat_activity is one thing, but you would also reduce the coverage
of log_line_prefix with %Q. And that can provide really useful
debugging information in the code paths where the query ID would be
reset as an effect of the proposed patch.
The patch to report the query ID of a planned query when running a
query through a PortalRunSelect() feels more intuitive in the
information it reports.
--
Michael
On 15/5/2024 12:09, Michael Paquier wrote:
On Wed, May 15, 2024 at 03:24:05AM +0000, Imseih (AWS), Sami wrote:
I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.If we reset queryId at the end of execution, this behavior breaks. Right?
Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly. Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.
Okay, that's what I precisely wanted to understand: queryId doesn't have
semantics to show the job that consumes resources right now—it is mostly
about convenience to know that the backend processes nothing except
(probably) this query.
--
regards, Andrei Lepikhov
Okay, that's what I precisely wanted to understand: queryId doesn't have
semantics to show the job that consumes resources right now—it is mostly
about convenience to know that the backend processes nothing except
(probably) this query.
It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query.
I can see this being useful and perhaps could be taken up in a
separate thread.
Regards,
Sami
On Wed, May 15, 2024 at 06:36:23PM +0000, Imseih (AWS), Sami wrote:
Okay, that's what I precisely wanted to understand: queryId doesn't have
semantics to show the job that consumes resources right now—it is mostly
about convenience to know that the backend processes nothing except
(probably) this query.It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query.
pg_stat_activity is already quite bloated with attributes, and I'd
suspect that there are more properties in a query that would be
interesting to track down at a thinner level as long as it mirrors a
dynamic activity of the query. Perhaps a separate catalog like a
pg_stat_query would make sense, moving query_start there as well?
Catalog breakages are never fun, still always happen because the
reasons behind a backward-incompatible change make the picture better
in the long-term for users.
--
Michael
On 15.05.2024 10:24, Imseih (AWS), Sami wrote:
I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.In exec_parse_message, exec_bind_message and exec_execute_message,
the queryId is reset via pgstat_report_activityI think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.
I discovered this case a bit.
As I can see, the origin of the problem is that the exec_execute_message
report STATE_RUNNING, although ExecutorStart was called in the
exec_bind_message routine beforehand.
I'm unsure if it needs to call ExecutorStart in the bind code. But if we
don't change the current logic, would it make more sense to move
pgstat_report_query_id to the ExecutorRun routine?
--
regards, Andrei Lepikhov
I'm unsure if it needs to call ExecutorStart in the bind code. But if we
don't change the current logic, would it make more sense to move
pgstat_report_query_id to the ExecutorRun routine?
I initially thought about that, but for utility statements (CTAS, etc.) being
executed with extended query protocol, we will still not advertise the queryId
as we should. This is why I chose to set the queryId in PortalRunSelect and
PortalRunMulti in v2 of the patch [1]/messages/by-id/FAB6AEA1-AB5E-4DFF-9A2E-BB320E6C3DF1@amazon.com.
We can advertise the queryId inside ExecutorRun instead of
PortalRunSelect as the patch does, but we will still need to advertise
the queryId inside PortalRunMulti.
[1]: /messages/by-id/FAB6AEA1-AB5E-4DFF-9A2E-BB320E6C3DF1@amazon.com
Regards,
Sami
Hi,
Wouldn't it be enough to call pgstat_report_query_id in ExecutorRun
and ProcessUtility? With those changes [1]https://github.com/bonnefoa/postgres/commit/bf4b332d7b481549c6d9cfa70db51e39a305b9b2, both normal statements and
utility statements called through extended protocol will correctly
report the query_id.
-- Test utility statement with extended protocol
show all \bind \g
-- Check reported query_id
select query, query_id from pg_stat_activity where
application_name ='psql' and pid!=pg_backend_pid();
query | query_id
-----------+---------------------
show all | -866221123969716490
[1]: https://github.com/bonnefoa/postgres/commit/bf4b332d7b481549c6d9cfa70db51e39a305b9b2
Regards,
Anthonin
On Wed, Jul 17, 2024 at 11:32:49AM +0200, Anthonin Bonnefoy wrote:
Wouldn't it be enough to call pgstat_report_query_id in ExecutorRun
and ProcessUtility? With those changes [1], both normal statements and
utility statements called through extended protocol will correctly
report the query_id.
Interesting, and this position is really tempting. By doing so you
would force the query ID to be set to the one from the CTAS and
EXPLAIN, because these would be executed before the inner queries, and
pgstat_report_query_id() with its non-force option does not overwrite
what would be already set (aka what should be the top-level query ID).
Using ExecutorRun() feels consistent with the closest thing I've
touched in this area lately in 1d477a907e63, because that's the only
code path that we are sure to take depending on the portal execution
(two execution scenarios depending on how rows are retrieved, as far
as I recall). The comment should be perhaps more consistent with the
executor start counterpart. So I would be OK with that.. The
location before the hook of ProcessUtility is tempting, as it would
take care of the case of PortalRunMulti(). However.. Joining with a
point from Sami upthread..
This is still not enough in the case of where we have a holdStore, no?
This is the case where we would do *one* ExecutorRun(), followed up by
a scan of the tuplestore in more than one execute message. The v2
proposed upthread, by positioning a query ID to be set in
PortalRunSelect(), is still positioning that in two places.
Hmm... How about being much more aggressive and just do the whole
business in exec_execute_message(), just before we do the PortalRun()?
I mean, that's the source of all our problems, and we know the
statements that the portal will work on so we could go through the
list, grab the first planned query and set the query ID based on that,
without caring about the portal patterns we would need to think about.
[1] https://github.com/bonnefoa/postgres/commit/bf4b332d7b481549c6d9cfa70db51e39a305b9b2
Or use the following to download the patch, that I am attaching here:
https://github.com/bonnefoa/postgres/commit/bf4b332d7b481549c6d9cfa70db51e39a305b9b2.patch
Please attach things to your emails, if your repository disappears for
a reason or another we would lose knowledge in the archives of the
community lists.
--
Michael
Attachments:
queryid.patchtext/x-diff; charset=us-asciiDownload
From bf4b332d7b481549c6d9cfa70db51e39a305b9b2 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 17 Jul 2024 10:27:35 +0200
Subject: [PATCH] Report query_id in ExecutorRun for extended protocol queries
When using extended protocol, exec_execute_message will reset
MyBEEntry's query_id when calling pgstat_report_activity. The query_id
will never be set after that since it is only set in parse analysis and
ExecutorStart. Because of that, all queries executed through extended
protocol will report an empty query_id in pg_stat_activity and the
query_id won't be propagated to parallel workers.
The patch fixes the issue by setting query_id in ExecutorRun, similarly
to what is already done in ExecutorStart.
---
src/backend/executor/execMain.c | 8 ++++++++
src/backend/tcop/utility.c | 7 +++++++
2 files changed, 15 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c198..69cca12723923 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -298,6 +298,14 @@ ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count,
bool execute_once)
{
+ /*
+ * When using extended protocol, query_id will be reset during
+ * pgstat_report_activity and never set since we don't go through parse
+ * analysis or ExecutorStart. Similar to what's done in ExecutorStart,
+ * report the queryId we know from the queryDesc.
+ */
+ pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+
if (ExecutorRun_hook)
(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
else
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fa66b8017edea..8cbd95701d93f 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -63,6 +63,7 @@
#include "storage/fd.h"
#include "tcop/utility.h"
#include "utils/acl.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/lsyscache.h"
@@ -510,6 +511,12 @@ ProcessUtility(PlannedStmt *pstmt,
Assert(queryString != NULL); /* required as of 8.4 */
Assert(qc == NULL || qc->commandTag == CMDTAG_UNKNOWN);
+ /*
+ * Utility statement executed through extended protocol won't have
+ * query_id set, report it here
+ */
+ pgstat_report_query_id(pstmt->queryId, false);
+
/*
* We provide a function hook variable that lets loadable plugins get
* control when ProcessUtility is called. Such a plugin would normally
On Wed, Jul 17, 2024 at 11:32:49AM +0200, Anthonin Bonnefoy wrote:
Wouldn't it be enough to call pgstat_report_query_id in ExecutorRun
and ProcessUtility? With those changes [1], both normal statements and
utility statements called through extended protocol will correctly
report the query_id.Interesting, and this position is really tempting. By doing so you
would force the query ID to be set to the one from the CTAS and
EXPLAIN, because these would be executed before the inner queries, and
pgstat_report_query_id() with its non-force option does not overwrite
what would be already set (aka what should be the top-level query ID).Using ExecutorRun() feels consistent with the closest thing I've
touched in this area lately in 1d477a907e63, because that's the only
code path that we are sure to take depending on the portal execution
(two execution scenarios depending on how rows are retrieved, as far
as I recall). The comment should be perhaps more consistent with the
executor start counterpart. So I would be OK with that.. The
location before the hook of ProcessUtility is tempting, as it would
take care of the case of PortalRunMulti(). However.. Joining with a
point from Sami upthread..This is still not enough in the case of where we have a holdStore, no?
This is the case where we would do *one* ExecutorRun(), followed up by
a scan of the tuplestore in more than one execute message. The v2
proposed upthread, by positioning a query ID to be set in
PortalRunSelect(), is still positioning that in two places.
Correct, I also don’t think ExecutorRun is enough. Another reason is we should also
be setting the queryId during bind, right before planning starts.
Planning could have significant impact on the server and I think we better
track the responsible queryId.
I have not tested the holdStore case. IIUC the holdStore deals with fetching a
WITH HOLD CURSOR. Why would this matter for this conversation?
Hmm... How about being much more aggressive and just do the whole
business in exec_execute_message(), just before we do the PortalRun()?
I mean, that's the source of all our problems, and we know the
statements that the portal will work on so we could go through the
list, grab the first planned query and set the query ID based on that,
without caring about the portal patterns we would need to think about.
Doing the work in exec_execute_message makes sense, although maybe
setting the queryId after pgstat_report_activity is better because it occurs earlier.
Also, we should do the same for exec_bind_message and set the queryId
right after pgstat_report_activity in this function as well.
We do have to account for the queryId changing after cache revalidation, so
we should still set the queryId inside GetCachedPlan in the case the query
underwent re-analysis. This means there is a chance that a queryId set at
the start of the exec_bind_message may be different by the time we complete
the function, in the case the query revalidation results in a different queryId.
See the attached v3.
Regards,
Sami
Attachments:
v3-0001-Fix-queryId-reporting-in-Extended-Query-Protocol.patchapplication/octet-stream; name=v3-0001-Fix-queryId-reporting-in-Extended-Query-Protocol.patch; x-unix-mode=0644Download
From 5e2f354f495731b97b41c44f1391f34a2fe7a839 Mon Sep 17 00:00:00 2001
From: Sami Imseih <samimseih@gmail.com>
Date: Tue, 23 Jul 2024 17:52:14 +0000
Subject: [PATCH 1/1] Fix queryId reporting in Extended Query Protocol
---
src/backend/tcop/postgres.c | 6 ++++++
src/backend/utils/cache/plancache.c | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ea517f4b9b..021d48e9fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1679,6 +1679,9 @@ exec_bind_message(StringInfo input_message)
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ if (list_length(psrc->query_list) > 0)
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, false);
+
set_ps_display("BIND");
if (save_log_statement_stats)
@@ -2147,6 +2150,9 @@ exec_execute_message(const char *portal_name, long max_rows)
pgstat_report_activity(STATE_RUNNING, sourceText);
+ if (list_length(portal->stmts) > 0)
+ pgstat_report_query_id(linitial_node(PlannedStmt, portal->stmts)->queryId, false);
+
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
set_ps_display_with_len(cmdtagname, cmdtaglen);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..762ff0a8ad 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -66,6 +66,7 @@
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/inval.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
@@ -1182,6 +1183,13 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
/* Make sure the querytree list is valid and we have parse-time locks */
qlist = RevalidateCachedQuery(plansource, queryEnv);
+ /*
+ * In case the query tree underwent re-analysis, we may have a new
+ * queryId to report.
+ */
+ if (qlist)
+ pgstat_report_query_id(linitial_node(Query, qlist)->queryId, false);
+
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
--
2.43.0
On Tue, Jul 23, 2024 at 04:00:25PM -0500, Sami Imseih wrote:
Correct, I also don´t think ExecutorRun is enough. Another reason is we should also
be setting the queryId during bind, right before planning starts.
Planning could have significant impact on the server and I think we better
track the responsible queryId.I have not tested the holdStore case. IIUC the holdStore deals with fetching a
WITH HOLD CURSOR. Why would this matter for this conversation?
Not only, see portal.h. This matters for holdable cursors,
PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT.
Doing the work in exec_execute_message makes sense, although maybe
setting the queryId after pgstat_report_activity is better because it occurs earlier.
Also, we should do the same for exec_bind_message and set the queryId
right after pgstat_report_activity in this function as well.
Sounds fine by me (still need to check all three patterns).
+ if (list_length(psrc->query_list) > 0)
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, false);
Something that slightly worries me is to assume that the first Query
in the query_list is fetched. Using a foreach() for all three paths
may be better, jumping out at the loop when finding a valid query ID.
I have not looked at that entirely in details, and I'd need to check
if it is possible to use what's here for more predictible tests:
/messages/by-id/ZqCMCS4HUshUYjGc@paquier.xyz
We do have to account for the queryId changing after cache revalidation, so
we should still set the queryId inside GetCachedPlan in the case the query
underwent re-analysis. This means there is a chance that a queryId set at
the start of the exec_bind_message may be different by the time we complete
the function, in the case the query revalidation results in a different queryId.
Makes sense to me. I'd rather make that a separate patch, with, if
possible, its own tests (the case of Andrei with a DROP/CREATE TABLE) .
--
Michael
On Thu, Jul 18, 2024 at 10:56 AM Michael Paquier <michael@paquier.xyz> wrote:
Please attach things to your emails, if your repository disappears for
a reason or another we would lose knowledge in the archives of the
community lists.
Noted and thanks for the reminder, I'm still learning about mailing
list etiquette.
I have not looked at that entirely in details, and I'd need to check
if it is possible to use what's here for more predictible tests:
/messages/by-id/ZqCMCS4HUshUYjGc@paquier.xyz
For the tests, there are limited possibilities to check whether a
query_id has been set correctly.
- Checking pg_stat_activity is not possible in the regress tests as
you need a second session to check the reported query_id.
- pg_stat_statements can be used indirectly but you're limited to how
pgss uses query_id. For example, it doesn't rely on queryId in
ExecutorRun.
A possible solution I've been thinking of is to use a test module. The
module will assert on whether the queryId is set or not in parse, plan
and executor hooks. It will also check if the queryId reported in
pgstat matches the queryId at the root level.
This allows us to check that the queryId is correctly set with the
extended protocol. I've also found some queries which will trigger a
failure (ctas and cursor usage) though this is probably a different
issue from the extended protocol issue.
Regards,
Anthonin
Attachments:
0001-Create-module-for-query_id-test.patchapplication/octet-stream; name=0001-Create-module-for-query_id-test.patchDownload
From 1bbeae4290d97997aebed45f2a8b36b32ef7dadd Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Fri, 26 Jul 2024 10:18:02 +0200
Subject: Create module for query_id test
There are limited possibilities to check whether a query_id has been set
correctly.
- Checking pg_stat_activity is not possible in the regress tests as you
need a second session to check the reported query_id.
- pg_stat_statements can be used indirectly but you're limited to how
pgss uses query_id. For example, it doesn't rely on queryId in
ExecutorRun.
To fix this, this patch introduces a new test module to check queryId.
This module install a check to all parse, plan and executor hooks and
assert that queryId is set. On top of that, when at the root level, it
checks that queryId reported by pgstat is the same as the processed
query.
---
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_query_id/Makefile | 20 ++
.../test_query_id/expected/test_query_id.out | 51 +++++
src/test/modules/test_query_id/meson.build | 28 +++
.../test_query_id/sql/test_query_id.sql | 46 +++++
.../modules/test_query_id/test_query_id.c | 185 ++++++++++++++++++
7 files changed, 332 insertions(+)
create mode 100644 src/test/modules/test_query_id/Makefile
create mode 100644 src/test/modules/test_query_id/expected/test_query_id.out
create mode 100644 src/test/modules/test_query_id/meson.build
create mode 100644 src/test/modules/test_query_id/sql/test_query_id.sql
create mode 100644 src/test/modules/test_query_id/test_query_id.c
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 256799f520a..a2391c685a5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
test_parser \
test_pg_dump \
test_predtest \
+ test_query_id \
test_radixtree \
test_rbtree \
test_regex \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d8fe059d236..cece5ff85fb 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -28,6 +28,7 @@ subdir('test_oat_hooks')
subdir('test_parser')
subdir('test_pg_dump')
subdir('test_predtest')
+subdir('test_query_id')
subdir('test_radixtree')
subdir('test_rbtree')
subdir('test_regex')
diff --git a/src/test/modules/test_query_id/Makefile b/src/test/modules/test_query_id/Makefile
new file mode 100644
index 00000000000..ffd5dfb0094
--- /dev/null
+++ b/src/test/modules/test_query_id/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/test_query_id/Makefile
+
+MODULE_big = test_query_id
+OBJS = \
+ $(WIN32RES) \
+ test_query_id.o
+PGFILEDESC = "test_query_id - test module for query_id"
+
+REGRESS = test_query_id
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_query_id
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_query_id/expected/test_query_id.out b/src/test/modules/test_query_id/expected/test_query_id.out
new file mode 100644
index 00000000000..949ed50e04e
--- /dev/null
+++ b/src/test/modules/test_query_id/expected/test_query_id.out
@@ -0,0 +1,51 @@
+-- Load test_query_id's hooks
+LOAD 'test_query_id';
+-- Test query with simple protocol
+SELECT 1;
+ ?column?
+----------
+ 1
+(1 row)
+
+-- Test a query with extended protocol
+SELECT 1 \bind \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- Test a ctas (fail)
+-- CREATE TABLE m AS SELECT i AS k, (i || ' v')::text v FROM generate_series(1, 16, 3) i;
+-- Test CREATE table
+CREATE TABLE n(a int);
+INSERT INTO n SELECT * FROM generate_series(1, 100);
+-- Test alter table
+ALTER TABLE n ADD UNIQUE (a);
+-- Test cursor use (fail)
+-- BEGIN;
+-- declare foocur CURSOR FOR SELECT * from n;
+-- fetch forward 10 from foocur;
+-- fetch forward 1 from foocur;
+-- CLOSE foocur;
+-- COMMIT;
+-- Test procedure declaration
+CREATE PROCEDURE proc_with_utility_stmt()
+LANGUAGE SQL
+AS $$
+ SHOW cpu_operator_cost;
+ SELECT 1;
+$$;
+-- Test procedure call
+CALL proc_with_utility_stmt();
+CREATE FUNCTION simple_function() returns int
+LANGUAGE SQL
+AS $$
+ SELECT 1;
+$$;
+-- Test inline function
+select * from (select * from simple_function());
+ simple_function
+-----------------
+ 1
+(1 row)
+
diff --git a/src/test/modules/test_query_id/meson.build b/src/test/modules/test_query_id/meson.build
new file mode 100644
index 00000000000..f4401e2591c
--- /dev/null
+++ b/src/test/modules/test_query_id/meson.build
@@ -0,0 +1,28 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_query_id_sources = files(
+ 'test_query_id.c',
+)
+
+if host_system == 'windows'
+ test_query_id_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_query_id',
+ '--FILEDESC', 'test_query_id - test module for query_ids',])
+endif
+
+test_query_id = shared_module('test_query_id',
+ test_query_id_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_query_id
+
+tests += {
+ 'name': 'test_query_id',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'test_query_id',
+ ],
+ },
+}
diff --git a/src/test/modules/test_query_id/sql/test_query_id.sql b/src/test/modules/test_query_id/sql/test_query_id.sql
new file mode 100644
index 00000000000..7176b5412a8
--- /dev/null
+++ b/src/test/modules/test_query_id/sql/test_query_id.sql
@@ -0,0 +1,46 @@
+-- Load test_query_id's hooks
+LOAD 'test_query_id';
+
+-- Test query with simple protocol
+SELECT 1;
+
+-- Test a query with extended protocol
+SELECT 1 \bind \g
+
+-- Test a ctas (fail)
+-- CREATE TABLE m AS SELECT i AS k, (i || ' v')::text v FROM generate_series(1, 16, 3) i;
+
+-- Test CREATE table
+CREATE TABLE n(a int);
+INSERT INTO n SELECT * FROM generate_series(1, 100);
+
+-- Test alter table
+ALTER TABLE n ADD UNIQUE (a);
+
+-- Test cursor use (fail)
+-- BEGIN;
+-- declare foocur CURSOR FOR SELECT * from n;
+-- fetch forward 10 from foocur;
+-- fetch forward 1 from foocur;
+-- CLOSE foocur;
+-- COMMIT;
+
+-- Test procedure declaration
+CREATE PROCEDURE proc_with_utility_stmt()
+LANGUAGE SQL
+AS $$
+ SHOW cpu_operator_cost;
+ SELECT 1;
+$$;
+
+-- Test procedure call
+CALL proc_with_utility_stmt();
+
+CREATE FUNCTION simple_function() returns int
+LANGUAGE SQL
+AS $$
+ SELECT 1;
+$$;
+
+-- Test inline function
+select * from (select * from simple_function());
diff --git a/src/test/modules/test_query_id/test_query_id.c b/src/test/modules/test_query_id/test_query_id.c
new file mode 100644
index 00000000000..074dbecbb83
--- /dev/null
+++ b/src/test/modules/test_query_id/test_query_id.c
@@ -0,0 +1,185 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_query_id.c
+ * Test if query_id is set and matches reported value in pgstat.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_query_id/test_query_id.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "optimizer/planner.h"
+#include "parser/analyze.h"
+#include "storage/lwlock.h"
+#include "tcop/utility.h"
+#include "utils/backend_status.h"
+
+PG_MODULE_MAGIC;
+
+static void test_query_id_post_parse_analyze(ParseState *pstate, Query *query,
+ JumbleState *jstate);
+static PlannedStmt *test_query_id_planner(Query *parse,
+ const char *query_string,
+ int cursorOptions,
+ ParamListInfo boundParams);
+static void test_query_id_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void test_query_id_ExecutorRun(QueryDesc *queryDesc,
+ ScanDirection direction,
+ uint64 count, bool execute_once);
+static void test_query_id_ExecutorFinish(QueryDesc *queryDesc);
+static void test_query_id_ExecutorEnd(QueryDesc *queryDesc);
+static void test_query_id_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+ bool readOnlyTree,
+ ProcessUtilityContext context, ParamListInfo params,
+ QueryEnvironment *queryEnv,
+ DestReceiver *dest, QueryCompletion *qc);
+
+/* Current nesting depth of planner/ExecutorRun/ProcessUtility calls */
+static int nesting_level = 0;
+
+static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
+static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart = NULL;
+static ExecutorRun_hook_type prev_ExecutorRun = NULL;
+static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
+static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
+static ProcessUtility_hook_type prev_ProcessUtility = NULL;
+
+static void
+test_query_id_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
+{
+ if (prev_post_parse_analyze_hook)
+ prev_post_parse_analyze_hook(pstate, query, jstate);
+ Assert(query->queryId);
+ /* queryId will be reported to pgstat just after this hook */
+}
+
+static PlannedStmt *
+test_query_id_planner(Query *parse,
+ const char *query_string,
+ int cursorOptions,
+ ParamListInfo boundParams)
+{
+ PlannedStmt *result;
+
+ nesting_level++;
+ if (prev_planner_hook)
+ result = prev_planner_hook(parse, query_string, cursorOptions,
+ boundParams);
+ else
+ result = standard_planner(parse, query_string, cursorOptions,
+ boundParams);
+ nesting_level--;
+ Assert(parse->queryId);
+ if (nesting_level == 0)
+ Assert(parse->queryId == pgstat_get_my_query_id());
+ return result;
+}
+
+static void
+test_query_id_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ if (prev_ExecutorStart)
+ prev_ExecutorStart(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+ Assert(queryDesc->plannedstmt->queryId);
+ if (nesting_level == 0)
+ Assert(queryDesc->plannedstmt->queryId == pgstat_get_my_query_id());
+}
+
+static void
+test_query_id_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count,
+ bool execute_once)
+{
+ nesting_level++;
+ if (prev_ExecutorRun)
+ prev_ExecutorRun(queryDesc, direction, count, execute_once);
+ else
+ standard_ExecutorRun(queryDesc, direction, count, execute_once);
+ nesting_level--;
+ Assert(queryDesc->plannedstmt->queryId);
+ if (nesting_level == 0)
+ Assert(queryDesc->plannedstmt->queryId == pgstat_get_my_query_id());
+}
+
+static void
+test_query_id_ExecutorFinish(QueryDesc *queryDesc)
+{
+ nesting_level++;
+ if (prev_ExecutorFinish)
+ prev_ExecutorFinish(queryDesc);
+ else
+ standard_ExecutorFinish(queryDesc);
+ nesting_level--;
+ Assert(queryDesc->plannedstmt->queryId);
+ if (nesting_level == 0)
+ Assert(queryDesc->plannedstmt->queryId == pgstat_get_my_query_id());
+}
+
+static void
+test_query_id_ExecutorEnd(QueryDesc *queryDesc)
+{
+ if (prev_ExecutorEnd)
+ prev_ExecutorEnd(queryDesc);
+ else
+ standard_ExecutorEnd(queryDesc);
+ Assert(queryDesc->plannedstmt->queryId);
+ if (nesting_level == 0)
+ Assert(queryDesc->plannedstmt->queryId == pgstat_get_my_query_id());
+}
+
+/*
+ * ProcessUtility hook
+ */
+static void
+test_query_id_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+ bool readOnlyTree,
+ ProcessUtilityContext context,
+ ParamListInfo params, QueryEnvironment *queryEnv,
+ DestReceiver *dest, QueryCompletion *qc)
+{
+ nesting_level++;
+ if (prev_ProcessUtility)
+ prev_ProcessUtility(pstmt, queryString, readOnlyTree,
+ context, params, queryEnv,
+ dest, qc);
+ else
+ standard_ProcessUtility(pstmt, queryString, readOnlyTree,
+ context, params, queryEnv,
+ dest, qc);
+ nesting_level--;
+ Assert(pstmt->queryId);
+ if (nesting_level == 0)
+ Assert(pstmt->queryId == pgstat_get_my_query_id());
+}
+
+void
+_PG_init(void)
+{
+ EnableQueryId();
+
+ /*
+ * Install hooks.
+ */
+ prev_post_parse_analyze_hook = post_parse_analyze_hook;
+ post_parse_analyze_hook = test_query_id_post_parse_analyze;
+ prev_planner_hook = planner_hook;
+ planner_hook = test_query_id_planner;
+ prev_ExecutorStart = ExecutorStart_hook;
+ ExecutorStart_hook = test_query_id_ExecutorStart;
+ prev_ExecutorRun = ExecutorRun_hook;
+ ExecutorRun_hook = test_query_id_ExecutorRun;
+ prev_ExecutorFinish = ExecutorFinish_hook;
+ ExecutorFinish_hook = test_query_id_ExecutorFinish;
+ prev_ExecutorEnd = ExecutorEnd_hook;
+ ExecutorEnd_hook = test_query_id_ExecutorEnd;
+ prev_ProcessUtility = ProcessUtility_hook;
+ ProcessUtility_hook = test_query_id_ProcessUtility;
+}
--
2.39.3 (Apple Git-146)
On Fri, Jul 26, 2024 at 02:39:41PM +0200, Anthonin Bonnefoy wrote:
For the tests, there are limited possibilities to check whether a
query_id has been set correctly.
- Checking pg_stat_activity is not possible in the regress tests as
you need a second session to check the reported query_id.
- pg_stat_statements can be used indirectly but you're limited to how
pgss uses query_id. For example, it doesn't rely on queryId in
ExecutorRun.A possible solution I've been thinking of is to use a test module. The
module will assert on whether the queryId is set or not in parse, plan
and executor hooks. It will also check if the queryId reported in
pgstat matches the queryId at the root level.
FWIW, I was more thinking in the lines of a TAP test with
PostgreSQL::Test::BackgroundPsql to hold the sessions around while
doing pg_stat_activity lookups.
Using a test module like what you have is really tempting to rely on
the hooks for the work, that's something I'll try to think about more.
We could perhaps push the query ID into a table saving the state that
gets queried in the SQL test, using only assertions is not enough as
this makes the test moot with assertions disabled. And actually,
there may be a point in just pushing safety assertions to be in the
core backend code, as a HEAD-only improvement.
--
Michael
Sounds fine by me (still need to check all three patterns).
+ if (list_length(psrc->query_list) > 0) + pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, false);Something that slightly worries me is to assume that the first Query
in the query_list is fetched. Using a foreach() for all three paths
may be better, jumping out at the loop when finding a valid query ID.
I cannot see how the inital node would not contain the queryId, but
to be on the safe side, your suggestion makes sense.
Are you thinking something like the below? In the foreach,
check for the first queryId != 0, set the queryId and then
break out of the loop
foreach(lc, psrc->query_list)
{
Query *query = lfirst_node(Query, lc);
if (query->queryId != UINT64CONST(0))
{
pgstat_report_query_id(query->queryId, false);
break;
}
}
We do have to account for the queryId changing after cache revalidation, so
we should still set the queryId inside GetCachedPlan in the case the query
underwent re-analysis. This means there is a chance that a queryId set at
the start of the exec_bind_message may be different by the time we complete
the function, in the case the query revalidation results in a different queryId.Makes sense to me. I'd rather make that a separate patch, with, if
I will create a separate patch for this.
possible, its own tests (the case of Andrei with a DROP/CREATE TABLE) .
In terms of testing, there are several options being discussed [1]/messages/by-id/ZqQk0WHN8EMBEai9@paquier.xyz including
BackgroundPsql and using hooks. I want to add a another idea which
is to rely on compute_plan_id = regress to log if my_query_id is a
non-zero value inside pgstat_report_query_id. Something like below:
@@ -640,6 +641,14 @@ pgstat_report_query_id(uint64 query_id, bool force)
PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
beentry->st_query_id = query_id;
PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+ if (compute_query_id == COMPUTE_QUERY_ID_REGRESS)
+ {
+ int64 queryId = pgstat_get_my_query_id();
+
+ if (queryId != UINT64CONST(0))
+ elog(DEBUG3, "queryId value is not zero");
+ }
The number of logs can be counted and compared with what
is expected. For example, in simple query, I expect the queryId to be
set once. Using the \bind, I expect the queryId to be set 3 times (
parse/bind/execute).
Specifically for the DROP/CREATE TABLE test, the \parse and \bindx
being proposed in [2]/messages/by-id/ZqCMCS4HUshUYjGc@paquier.xyz can be used. The table can be dropped and
recreated after the \parse step. If we count the logs, we would expect
a total of 4 logs to be set (parse/bind/revalidation/execution).
I think the testing discussion should be moved to a different thread.
What do you think?
Regards,
Sami
[1]: /messages/by-id/ZqQk0WHN8EMBEai9@paquier.xyz
[2]: /messages/by-id/ZqCMCS4HUshUYjGc@paquier.xyz
I think the testing discussion should be moved to a different thread.
What do you think?
See v4.
0001 deals with reporting queryId in exec_execute_message and
exec_bind_message.
0002 deals with reporting queryId after a cache invalidation.
There are no tests as this requires more discussion in a separate thread(?)
Regards,
Sami
Attachments:
v4-0001-Add-missing-queryId-reporting-in-extended-query.patchtext/plain; charset=UTF-8; name=v4-0001-Add-missing-queryId-reporting-in-extended-query.patchDownload
From 0727a81033c34350ddc782366ea4f8b4061cdf6e Mon Sep 17 00:00:00 2001
From: Sami Imseih <samimseih@gmail.com>
Date: Wed, 14 Aug 2024 20:50:01 +0000
Subject: [PATCH v4 1/2] Add missing queryId reporting in extended query
---
src/backend/tcop/postgres.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..5ef59deeed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1654,6 +1654,7 @@ exec_bind_message(StringInfo input_message)
char msec_str[32];
ParamsErrorCbData params_data;
ErrorContextCallback params_errcxt;
+ ListCell *lc;
/* Get the fixed part of the message */
portal_name = pq_getmsgstring(input_message);
@@ -1689,6 +1690,17 @@ exec_bind_message(StringInfo input_message)
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ foreach(lc, psrc->query_list)
+ {
+ Query *query = lfirst_node(Query, lc);
+
+ if (query->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(query->queryId, false);
+ break;
+ }
+ }
+
set_ps_display("BIND");
if (save_log_statement_stats)
@@ -2111,6 +2123,7 @@ exec_execute_message(const char *portal_name, long max_rows)
ErrorContextCallback params_errcxt;
const char *cmdtagname;
size_t cmdtaglen;
+ ListCell *lc;
/* Adjust destination to tell printtup.c what to do */
dest = whereToSendOutput;
@@ -2157,6 +2170,17 @@ exec_execute_message(const char *portal_name, long max_rows)
pgstat_report_activity(STATE_RUNNING, sourceText);
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, false);
+ break;
+ }
+ }
+
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
set_ps_display_with_len(cmdtagname, cmdtaglen);
--
2.43.0
v4-0002-Report-new-queryId-after-plancache-re-validation.patchtext/plain; charset=UTF-8; name=v4-0002-Report-new-queryId-after-plancache-re-validation.patchDownload
From f0cf4fb884e3a75940300691ce0c7908f2f39cad Mon Sep 17 00:00:00 2001
From: Sami Imseih <samimseih@gmail.com>
Date: Wed, 14 Aug 2024 20:50:37 +0000
Subject: [PATCH v4 2/2] Report new queryId after plancache re-validation
---
src/backend/utils/cache/plancache.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..aa507846d6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -66,6 +66,7 @@
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/inval.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
@@ -590,6 +591,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
TupleDesc resultDesc;
MemoryContext querytree_context;
MemoryContext oldcxt;
+ ListCell *lc;
/*
* For one-shot plans, we do not support revalidation checking; it's
@@ -805,6 +807,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
plansource->is_valid = true;
+ /*
+ * Override the current queryId with the one from the re-validated
+ * query tree.
+ */
+ foreach(lc, tlist)
+ {
+ Query *stmt = lfirst_node(Query, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, true);
+ break;
+ }
+ }
+
/* Return transient copy of querytrees for possible use in planning */
return tlist;
}
--
2.43.0
On Thu, Aug 15, 2024 at 5:06 AM Imseih (AWS), Sami <samimseih@gmail.com> wrote:
I think the testing discussion should be moved to a different thread.
What do you think?See v4.
0001 deals with reporting queryId in exec_execute_message and
exec_bind_message.
0002 deals with reporting queryId after a cache invalidation.There are no tests as this requires more discussion in a separate thread(?)
hi.
v4-0001 work as expected. i don't know how to test 0002
In 0001 and 0002, all foreach loops, we can use the new macro foreach_node.
see https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
the following are the minimum tests I come up with for 0001
/* test \bind queryid exists */
select query_id is not null as query_id_exist
from pg_stat_activity where pid = pg_backend_pid() \bind \g
/* test that \parse \bind_named queryid exists */
select pg_backend_pid() as current_pid \gset pref01_
select query_id is not null as query_id_exist from pg_stat_activity
where pid = $1 \parse stmt11
\bind_named stmt11 :pref01_current_pid \g
On Sat, Aug 31, 2024 at 09:47:41AM +0800, jian he wrote:
/* test \bind queryid exists */
select query_id is not null as query_id_exist
from pg_stat_activity where pid = pg_backend_pid() \bind \g/* test that \parse \bind_named queryid exists */
select pg_backend_pid() as current_pid \gset pref01_
select query_id is not null as query_id_exist from pg_stat_activity
where pid = $1 \parse stmt11
\bind_named stmt11 :pref01_current_pid \g
I need to spend a bit more time with my head down for this thread, but
couldn't we use these commands with various query patterns in
pg_stat_statements and look at the shmem counters reported through its
view?
--
Michael
On 14/8/2024 23:05, Imseih (AWS), Sami wrote:
I think the testing discussion should be moved to a different thread.
What do you think?See v4.
0001 deals with reporting queryId in exec_execute_message and
exec_bind_message.
0002 deals with reporting queryId after a cache invalidation.There are no tests as this requires more discussion in a separate thread(?)
At first, these patches look good.
But I have a feeling of some mess here:
queryId should be initialised at the top-level query. At the same time,
the RevalidateCachedQuery routine can change this value in the case of
the query tree re-validation.
You can say that this routine can't be called from a non-top-level query
right now, except SPI. Yes, but what about extensions or future usage?
--
regards, Andrei Lepikhov
On 14/8/2024 23:05, Imseih (AWS), Sami wrote:
There are no tests as this requires more discussion in a separate thread(?)
Unfortunately, TAP tests don't allow us to keep a connection and
manually permutate the order of queries sent to different connections.
But isolation tests are designed to do so. Of course, they aren't the
best if you need to compare values produced by various queries but see a
clumsy sketch doing that in the attachment.
Also, while writing the test, I found out that now, JumbleQuery takes
into account constants of the A_Const node, and calls of the same
prepared statement with different parameters generate different
query_id. Is it a reason to introduce JumbleQuery options and allow
different logic of queryid generation?
--
regards, Andrei Lepikhov
Attachments:
0001-queryid-isolation-test.patchtext/plain; charset=UTF-8; name=0001-queryid-isolation-test.patchDownload
From 8019a716714477c9aeeed1f208d76f87722b051a Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 3 Sep 2024 16:38:29 +0200
Subject: [PATCH] queryid isolation test
---
src/test/isolation/expected/queryid.out | 1 +
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/queryid.spec | 48 +++++++++++++++++++++++++
3 files changed, 50 insertions(+)
create mode 100644 src/test/isolation/expected/queryid.out
create mode 100644 src/test/isolation/specs/queryid.spec
diff --git a/src/test/isolation/expected/queryid.out b/src/test/isolation/expected/queryid.out
new file mode 100644
index 0000000000..7ec1dff5ad
--- /dev/null
+++ b/src/test/isolation/expected/queryid.out
@@ -0,0 +1 @@
+Parsed test spec with 2 sessions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 143109aa4d..823720fab5 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,3 +1,4 @@
+test: queryid
test: read-only-anomaly
test: read-only-anomaly-2
test: read-only-anomaly-3
diff --git a/src/test/isolation/specs/queryid.spec b/src/test/isolation/specs/queryid.spec
new file mode 100644
index 0000000000..1c806c5eb9
--- /dev/null
+++ b/src/test/isolation/specs/queryid.spec
@@ -0,0 +1,48 @@
+# Test for checking correct update of queryid
+
+setup
+{
+ CREATE TABLE test_queryid(x int);
+ INSERT INTO test_queryid (x) VALUES (2);
+ INSERT INTO test_queryid (x) VALUES (1);
+ ANALYZE test_queryid;
+}
+
+teardown
+{
+ DROP TABLE test_queryid CASCADE;
+ RESET compute_query_id;
+}
+
+session s1
+step s1_begin { SET compute_query_id = 'on'; }
+step s1_trivial { /* queryid test */ SELECT relname FROM pg_class WHERE oid < 1; }
+step s1_prepare { /* queryid test */ PREPARE probe (int) AS SELECT * FROM test_queryid t1 JOIN test_queryid t2 USING (x) WHERE t1.x = $1 AND t2.x = 2; }
+step s1_probe1 { /* queryid test */ EXECUTE probe(1); }
+step s1_probe2 { /* queryid test */ EXECUTE probe(2); }
+
+session s2
+step s2_get_qqid1 { SELECT query_id INTO res1 FROM pg_stat_activity WHERE query LIKE '/* queryid test */%'; }
+step s2_get_qqid2 { SELECT query_id INTO res2 FROM pg_stat_activity WHERE query LIKE '/* queryid test */%'; }
+step s2_check_res1 { SELECT query_id <> 0 FROM res1;}
+step s2_check_res2 { SELECT query_id <> 0 FROM res2;}
+step s2_cmp_r1r2 { SELECT (SELECT query_id FROM res1) <> (SELECT query_id FROM res2);}
+step s2_resdrop { DROP TABLE IF EXISTS res1,res2; }
+
+permutation
+ s1_begin
+ s1_trivial
+ s2_get_qqid1
+ s2_check_res1
+ s1_prepare
+ s2_get_qqid2
+ s2_check_res2
+ s2_cmp_r1r2
+ s2_resdrop
+ s1_probe1
+ s2_get_qqid1
+ s2_check_res1
+ s1_probe2
+ s2_get_qqid2
+ s2_check_res2
+ s2_cmp_r1r2
--
2.46.0
Sorry for the late reply on this thread.
On 14/8/2024 23:05, Imseih (AWS), Sami wrote:
There are no tests as this requires more discussion in a separate thread(?)
Unfortunately, TAP tests don't allow us to keep a connection and
manually permutate the order of queries sent to different connections.
But isolation tests are designed to do so. Of course, they aren't the
best if you need to compare values produced by various queries but see a
clumsy sketch doing that in the attachment.
It would be nice to use isolation tests as you have, those type of tests
don't support psql meta-commands. We need \parse, \bind, \bind_named
to test queryId for queries issued through extended query protocol.
With TAP tests we can use query_until in BackgroundPsql to have one
connection issue a command and another connection track the # of distinct
queryIds expected. See the 007_query_id.pl of an example TAP test that
could be added under test_misc.
An INJECTION_POINT can also be added right before we call pgstat_report_query_id
in plancache.c. This will allow us to test when we expect the queryId to
change after a cache revalidation. Thoughts?
Also, while writing the test, I found out that now, JumbleQuery takes
into account constants of the A_Const node, and calls of the same
prepared statement with different parameters generate different
query_id. Is it a reason to introduce JumbleQuery options and allow
different logic of queryid generation?
Can you start a new thread for this prepared statement scenario?
--
Sami
Attachments:
I think the testing discussion should be moved to a different thread.
What do you think?See v4.
0001 deals with reporting queryId in exec_execute_message and
exec_bind_message.
0002 deals with reporting queryId after a cache invalidation.There are no tests as this requires more discussion in a separate thread(?)
At first, these patches look good.
But I have a feeling of some mess here:
queryId should be initialised at the top-level query. At the same time,
the RevalidateCachedQuery routine can change this value in the case of
the query tree re-validation.
You can say that this routine can't be called from a non-top-level query
right now, except SPI. Yes, but what about extensions or future usage?
This is a valid point. RevalidatePlanCache is forcing a
new queryId to be advertised ( 'true' as the second argument to
pgstat_report_query_id) . This means,
v4-0002-Report-new-queryId-after-plancache-re-validation.patch
will result in a non top-level queryId being advertised.
See the attached test case.
I need to think about this a bit.
--
Sami
Attachments:
I think the testing discussion should be moved to a different thread.
What do you think?See v4.
0001 deals with reporting queryId in exec_execute_message and
exec_bind_message.
0002 deals with reporting queryId after a cache invalidation.There are no tests as this requires more discussion in a separate thread(?)
At first, these patches look good.
But I have a feeling of some mess here:
queryId should be initialised at the top-level query. At the same time,
the RevalidateCachedQuery routine can change this value in the case of
the query tree re-validation.
You can say that this routine can't be called from a non-top-level query
right now, except SPI. Yes, but what about extensions or future usage?
This is a valid point. RevalidatePlanCache is forcing a
new queryId to be advertised ( 'true' as the second argument to
pgstat_report_query_id) . This means,
v4-0002-Report-new-queryId-after-plancache-re-validation.patch
will result in a non top-level queryId being advertised.
An idea would be to add bool field called force_update_qid to
CachedPlanSource, and this field can be set to 'true' after a call
to CreateCachedPlan. RevalidateCachedQuery will only update
the queryId if this value is 'true'.
For now, only exec_parse_message will set this field to 'true',
but any caller can decide to set it to 'true' if there are other
cases in the future.
What do you think?
--
Sami
On Mon, Sep 09, 2024 at 06:20:01PM -0500, Sami Imseih wrote:
On 14/8/2024 23:05, Imseih (AWS), Sami wrote:
Also, while writing the test, I found out that now, JumbleQuery takes
into account constants of the A_Const node, and calls of the same
prepared statement with different parameters generate different
query_id. Is it a reason to introduce JumbleQuery options and allow
different logic of queryid generation?Can you start a new thread for this prepared statement scenario?
Yes, please, this makes the thread rather confusing by adding
different problems into the mix that require different analysis and
actions. Let's only focus on the issue that the query ID reporting
in pg_stat_activity is missing for the extended query protocol here.
--
Michael
On Mon, Sep 02, 2024 at 10:11:43AM +0900, Michael Paquier wrote:
I need to spend a bit more time with my head down for this thread, but
couldn't we use these commands with various query patterns in
pg_stat_statements and look at the shmem counters reported through its
view?
My apologies for the time it took, but here you go with a patch set.
I have looked at this thread overall, and there are two problems at
hand regarding the lack of reporting of the query ID in backend
entries for the extended query protocol:
1) ExecutorRun() misses the reports, which happens when a query
does an ExecutorStart(), then a series of ExecutorRun() through a
portal with bind messages. Robert has mentioned that separately a few
days ago at [1]/messages/by-id/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com -- Michael. But that's not everything.
2) A query executed through a portal with tuples to return in a
tuplestore also miss the query ID report. For example, a DML
RETURNING with the extended protocol would use an execute (with
ExecutorStart and ExecutorRun) followed by a series of execute fetch.
pg_stat_activity would report the query ID for the execute, not for
the fetches, while pg_stat_activity has the query string. That's
confusing.
The patch series attached address these two in 0001 and 0003. 0001
should be backpatched (still need to wordsmith the comments), where
I've come down to the approach of using a report in ExecutorRun()
because it is simpler and it does the job. Perhaps also 0003, but
nobody has complained about that, either.
I have also looked at the tests proposed (isolation, TAP, custom
module); all of them are a bit disappointing because they duplicate
some patterns that are already tested in pg_stat_statements, while
willing to check the contents of pg_stat_statements. I am afraid that
it is not going to age well because we'd need to have the same query
patterns in more than one place. We should have tests, definitely,
but we can do an equivalent of pg_stat_activity lookups by calling
pgstat_get_my_query_id() in strategic places, making sure that all
dedicated paths always have the query ID reported:
- Check pgstat_get_my_query_id() in the run, finish and end executor
hooks.
- In the parse-analyze hook, before the query ID is reported (except
for a PREPARE), check that the ID in a Query is set.
The test proposed by Robert on the other thread was fancy enough that
I've added it. All that is in 0002, and that's enough to cause 0001
to fail, planning only these on HEAD. Tests in 0003 require fetch
messages, and I don't have a trick in my sleeves except if we invent a
new meta-command in psql.
There are other problems mentioned on this thread, with plan caching
for example. Let's deal with that separately, in separate threads.
[1]: /messages/by-id/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com -- Michael
--
Michael
Attachments:
0001-Fix-reporting-of-query-ID-with-extended-query-protoc.patchtext/x-diff; charset=us-asciiDownload
From d4d469da18d6442fed5a6fd5824f827e28cf5bb3 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 11 Sep 2024 14:12:24 +0900
Subject: [PATCH 1/3] Fix reporting of query ID with extended query protocol
A query run through the extended query protocol would miss reporting its
query ID to pg_stat_statements, as the value is reset when beginning the
processing of a new execute message.
ExecutorStart() was calling pgstat_report_query_id(), but it missed the
fact that multiple ExecutorRun() calls could be issued for a single
query with an initial ExecutorStart() across multiple execute messages,
hence the query ID would be missing in the second execute.
Backpatch-through: 14
---
src/backend/executor/execMain.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186fa73..94e18f2e36 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,10 +119,11 @@ void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
- * In some cases (e.g. an EXECUTE statement) a query execution will skip
- * parse analysis, which means that the query_id won't be reported. Note
- * that it's harmless to report the query_id multiple times, as the call
- * will be ignored if the top level query_id has already been reported.
+ * In some cases (e.g. an EXECUTE statement or an execute message with
+ * the extended query protocol) the query_id won't be reported, so do it
+ * now. Note that it's harmless to report the query_id multiple times,
+ * as the call will be ignored if the top level query_id has already been
+ * reported.
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
@@ -293,6 +294,17 @@ ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count,
bool execute_once)
{
+ /*
+ * When executing a query with the extended query protocol,
+ * ExecutorStart() may not be called, causing the query ID to not be
+ * reported. Hence, do it again here in case it was missed.
+ *
+ * Reporting multiple times the query ID is harmless.
+ *
+ * See also comments in ExecutorStart().
+ */
+ pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
+
if (ExecutorRun_hook)
(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
else
--
2.45.2
0002-Add-sanity-checks-related-to-query-ID-reporting-in-p.patchtext/x-diff; charset=us-asciiDownload
From 8408f9929af5cf35bf2dc888777782e476160c4b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 11 Sep 2024 14:15:03 +0900
Subject: [PATCH 2/3] Add sanity checks related to query ID reporting in
pg_stat_statements
This includes as well tests with the extended query protocol, checking
that the query ID is correctly set when going through the executor hooks
of the module.
These are new tests, so no backpatch is done.
---
contrib/pg_stat_statements/Makefile | 2 +-
.../pg_stat_statements/expected/extended.out | 62 +++++++++++++++++++
contrib/pg_stat_statements/meson.build | 1 +
.../pg_stat_statements/pg_stat_statements.c | 17 +++++
contrib/pg_stat_statements/sql/extended.sql | 17 +++++
5 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 contrib/pg_stat_statements/expected/extended.out
create mode 100644 contrib/pg_stat_statements/sql/extended.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index c19ccad77e..1622b43ded 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -19,7 +19,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
REGRESS = select dml cursors utility level_tracking planning \
- user_activity wal entry_timestamp privileges cleanup \
+ user_activity wal entry_timestamp privileges extended cleanup \
oldextversions
# Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
# which typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out
new file mode 100644
index 0000000000..09c1f895e5
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/extended.out
@@ -0,0 +1,62 @@
+-- Tests for extended query protocol
+SELECT query_id IS NOT NULL AS query_id_set
+ FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g
+ query_id_set
+--------------
+ t
+(1 row)
+
+SELECT $1 \parse stmt1
+SELECT $1, $2 \parse stmt2
+SELECT $1, $2, $3 \parse stmt3
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+\bind_named stmt1 'stmt1_val1' \g
+ ?column?
+------------
+ stmt1_val1
+(1 row)
+
+\bind_named stmt2 'stmt2_val1' 'stmt2_val2' \g
+ ?column? | ?column?
+------------+------------
+ stmt2_val1 | stmt2_val2
+(1 row)
+
+\bind_named stmt3 'stmt3_val1' 'stmt3_val2' 'stmt3_val3' \g
+ ?column? | ?column? | ?column?
+------------+------------+------------
+ stmt3_val1 | stmt3_val2 | stmt3_val3
+(1 row)
+
+\bind_named stmt3 'stmt3_val4' 'stmt3_val5' 'stmt3_val6' \g
+ ?column? | ?column? | ?column?
+------------+------------+------------
+ stmt3_val4 | stmt3_val5 | stmt3_val6
+(1 row)
+
+\bind_named stmt2 'stmt2_val3' 'stmt2_val4' \g
+ ?column? | ?column?
+------------+------------
+ stmt2_val3 | stmt2_val4
+(1 row)
+
+\bind_named stmt1 'stmt1_val1' \g
+ ?column?
+------------
+ stmt1_val1
+(1 row)
+
+SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls | rows | query
+-------+------+----------------------------------------------------
+ 2 | 2 | SELECT $1
+ 2 | 2 | SELECT $1, $2
+ 2 | 2 | SELECT $1, $2, $3
+ 1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(4 rows)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 5cf926d1f8..e14669ca15 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -51,6 +51,7 @@ tests += {
'wal',
'entry_timestamp',
'privileges',
+ 'extended',
'cleanup',
'oldextversions',
],
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 362d222f63..1ae31b82a8 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -834,6 +834,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
return;
+ /*
+ * Query ID should be set in the query.
+ *
+ * Note that the query ID may be already reported (like in a PREPARE
+ * statement, hence there is no check based on pgstat_get_my_query_id).
+ */
+ Assert(query->queryId != 0);
+
/*
* If it's EXECUTE, clear the queryId so that stats will accumulate for
* the underlying PREPARE. But don't do this if we're not tracking
@@ -1022,6 +1030,9 @@ static void
pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count,
bool execute_once)
{
+ /* Query ID should be reported. */
+ Assert(pgstat_get_my_query_id() != 0);
+
nesting_level++;
PG_TRY();
{
@@ -1043,6 +1054,9 @@ pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count,
static void
pgss_ExecutorFinish(QueryDesc *queryDesc)
{
+ /* Query ID should be reported. */
+ Assert(pgstat_get_my_query_id() != 0);
+
nesting_level++;
PG_TRY();
{
@@ -1066,6 +1080,9 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
{
uint64 queryId = queryDesc->plannedstmt->queryId;
+ /* Query ID should be reported. */
+ Assert(pgstat_get_my_query_id() != 0);
+
if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
pgss_enabled(nesting_level))
{
diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql
new file mode 100644
index 0000000000..c039c5de01
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/extended.sql
@@ -0,0 +1,17 @@
+-- Tests for extended query protocol
+
+SELECT query_id IS NOT NULL AS query_id_set
+ FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g
+
+SELECT $1 \parse stmt1
+SELECT $1, $2 \parse stmt2
+SELECT $1, $2, $3 \parse stmt3
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+\bind_named stmt1 'stmt1_val1' \g
+\bind_named stmt2 'stmt2_val1' 'stmt2_val2' \g
+\bind_named stmt3 'stmt3_val1' 'stmt3_val2' 'stmt3_val3' \g
+\bind_named stmt3 'stmt3_val4' 'stmt3_val5' 'stmt3_val6' \g
+\bind_named stmt2 'stmt2_val3' 'stmt2_val4' \g
+\bind_named stmt1 'stmt1_val1' \g
+
+SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
--
2.45.2
0003-Report-query-ID-for-execute-fetch-in-extended-query-.patchtext/x-diff; charset=us-asciiDownload
From 256ce554246cc459dc4efdab69a00d3503fca45e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 11 Sep 2024 14:16:24 +0900
Subject: [PATCH 3/3] Report query ID for execute fetch in extended query
protocol
This concerns for example queries that execute a query through a portal,
then do *not* require going through an execution again because all the
tuples to return are stored in a tuplestore part of a portal. For
example, this would happen for a RETURNING query with a fetch_size small
enough to require multiple fetch batch with a message sequence like:
<execute>
<fetch>
<...>
<fetch>
<commit>
The query ID would be set in the initial execute message, but not in any
of the follow up execute fetch messages.
Perhaps backpatch to 14? This would report new information in
pg_stat_activity, and nobody has complained about that, either.
---
src/backend/tcop/postgres.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..0f549468d2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2185,9 +2185,28 @@ exec_execute_message(const char *portal_name, long max_rows)
* then we are only fetching more rows rather than completely re-executing
* the query from the start. atStart is never reset for a v3 portal, so we
* are safe to use this check.
+ *
+ * The query ID is lost in this case as of pgstat_report_activity() done
+ * above, so set it again when only fetching rows.
*/
execute_is_fetch = !portal->atStart;
+ if (execute_is_fetch)
+ {
+ ListCell *lc;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, false);
+ break;
+ }
+ }
+ }
+
/* Log immediately if dictated by log_statement */
if (check_log_statement(portal->stmts))
{
--
2.45.2
I took a look at your patches and here are my comments
1) ExecutorRun() misses the reports, which happens when a query
does an ExecutorStart(), then a series of ExecutorRun() through a
portal with bind messages. Robert has mentioned that separately a few
days ago at [1]. But that's not everything.
2) A query executed through a portal with tuples to return in a
tuplestore also miss the query ID report. For example, a DML
RETURNING with the extended protocol would use an execute (with
ExecutorStart and ExecutorRun) followed by a series of execute fetch.
pg_stat_activity would report the query ID for the execute, not for
the fetches, while pg_stat_activity has the query string. That's
confusing.
1/
In your 0003-Report-query-ID-for-execute-fetch-in-extended-query-.patch
patch, you are still setting the queryId inside exec_execute_message
if (execute_is_fetch). This condition could be removed and don't need to set
the queryId inside ExecutorRun. This is exactly what v5-0001 does.
V5-0001 also sets the queryId inside the exec_bind_message.
We must do that otherwise we will have a NULL queryId during bind.
also tested it against this for the case that was raised by Robert [1]/messages/by-id/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com.
I also think we need to handle RevalidateCachedQuery. This is the case where we
have a new queryId after a cached query revalidation.
I addressed the comments by Andrei [3]/messages/by-id/F001F959-400F-41C6-9886-C9665A4DE0A3@gmail.com in v5-0002. For RevalidateCachedQuery,
we can simple call pgstat_report_query_id with "force" = "false" so it will take care
of updating a queryId only if it's a top level query.
2/
As far as 0002-Add-sanity-checks-related-to-query-ID-reporting-in-p.patch,
I do like the pg_stat_statements extended tests to perform these tests.
What about adding the Assert(pgstat_get_my_query_id() != 0) inside
exec_parse_message, exec_bind_message and exec_execute_message as well?
I think having the Asserts inside the hooks in pg_stat_statements are good
as well.
I am not sure how we can add tests for RevalidateCachedQuery though using
pg_stat_statements. We could skip testing this scenario, maybe??
Let me know what you think.
[1]: /messages/by-id/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
[2]: /messages/by-id/2beb1a00-3060-453a-90a6-7990d6940d62@gmail.com
[3]: /messages/by-id/F001F959-400F-41C6-9886-C9665A4DE0A3@gmail.com
Regards,
Sami
Attachments:
v5-0002-Report-new-queryId-after-plancache-re-validation.patchapplication/octet-stream; name=v5-0002-Report-new-queryId-after-plancache-re-validation.patch; x-mac-creator=4F50494DDownload
From f33cd22ed65321d8fa554f58ed3b265e4854df6d Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Wed, 11 Sep 2024 14:40:58 -0500
Subject: [PATCH 1/1] Report new queryId after plancache re-validation
---
src/backend/utils/cache/plancache.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..4ed5e1eca1 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -66,6 +66,7 @@
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/backend_status.h"
#include "utils/inval.h"
#include "utils/memutils.h"
#include "utils/resowner.h"
@@ -590,6 +591,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
TupleDesc resultDesc;
MemoryContext querytree_context;
MemoryContext oldcxt;
+ ListCell *lc;
/*
* For one-shot plans, we do not support revalidation checking; it's
@@ -805,6 +807,22 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
plansource->is_valid = true;
+ /*
+ * Update the current queryId with the one from the re-validated
+ * query tree, and only if it's a top-level queryId.
+ */
+
+ foreach(lc, tlist)
+ {
+ Query *stmt = lfirst_node(Query, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, false);
+ break;
+ }
+ }
+
/* Return transient copy of querytrees for possible use in planning */
return tlist;
}
--
2.39.3 (Apple Git-146)
v5-0001-Add-missing-queryId-reporting-in-extended-query.patchapplication/octet-stream; name=v5-0001-Add-missing-queryId-reporting-in-extended-query.patch; x-mac-creator=4F50494DDownload
From 0727a81033c34350ddc782366ea4f8b4061cdf6e Mon Sep 17 00:00:00 2001
From: Sami Imseih <samimseih@gmail.com>
Date: Wed, 14 Aug 2024 20:50:01 +0000
Subject: [PATCH v4 1/2] Add missing queryId reporting in extended query
---
src/backend/tcop/postgres.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..5ef59deeed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1654,6 +1654,7 @@ exec_bind_message(StringInfo input_message)
char msec_str[32];
ParamsErrorCbData params_data;
ErrorContextCallback params_errcxt;
+ ListCell *lc;
/* Get the fixed part of the message */
portal_name = pq_getmsgstring(input_message);
@@ -1689,6 +1690,17 @@ exec_bind_message(StringInfo input_message)
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ foreach(lc, psrc->query_list)
+ {
+ Query *query = lfirst_node(Query, lc);
+
+ if (query->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(query->queryId, false);
+ break;
+ }
+ }
+
set_ps_display("BIND");
if (save_log_statement_stats)
@@ -2111,6 +2123,7 @@ exec_execute_message(const char *portal_name, long max_rows)
ErrorContextCallback params_errcxt;
const char *cmdtagname;
size_t cmdtaglen;
+ ListCell *lc;
/* Adjust destination to tell printtup.c what to do */
dest = whereToSendOutput;
@@ -2157,6 +2170,17 @@ exec_execute_message(const char *portal_name, long max_rows)
pgstat_report_activity(STATE_RUNNING, sourceText);
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, false);
+ break;
+ }
+ }
+
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
set_ps_display_with_len(cmdtagname, cmdtaglen);
--
2.43.0
On Wed, Sep 11, 2024 at 05:02:07PM -0500, Sami Imseih wrote:
In your 0003-Report-query-ID-for-execute-fetch-in-extended-query-.patch
patch, you are still setting the queryId inside exec_execute_message
if (execute_is_fetch). This condition could be removed and don't need to set
the queryId inside ExecutorRun. This is exactly what v5-0001 does.V5-0001 also sets the queryId inside the exec_bind_message.
We must do that otherwise we will have a NULL queryId during bind.also tested it against this for the case that was raised by Robert [1].
There are a few ways to do things:
- Add an extra report in ExecutorRun(), because we know that it is
going to be what we are going to cross when using a portal with
multiple execution calls. This won't work for the case of multiple
fetch messages where there is only one initial ExecutorRun() call
followed by the tuple fetches, as you say.
- Add these higher in the stack, when processing the messages. In
which case, we can also argue about removing the calls in
ExecutorRun() and ExecutorStart(), entirely, because these are
unnecessary duplicates as long as the query ID is set close to where
it is reset when we are processing the kind and execute messages.
ExecutorStart() as report location is ill-thought from the start.
- Keep all of them, relying on the first one set as the follow-up ones
are harmless. Perhaps also just reduce the number of calls on HEAD.
After sleeping on it, I'd tend to slightly favor the last option in
the back-branches and the second option on HEAD where we reduce the
number of report calls. This way, we are a bit more careful in
released branches by being more aggressive in reporting the query ID.
That's also why I have ordered the previous patch set this way but
that was badly presented, even if it does not take care of the
processing of the execute_is_fetch case for execute messages.
The tests in pg_stat_statements are one part I'm pretty sure is one
good way forward. It is not perfect, but with the psql meta-commands
we have a good deal of coverage on top of the other queries already in
the test suite. That's also the only place in core where we force the
query ID across all these hooks, and this does not impact switching
the way stats are stored if we were to switch to pgstats in shmem with
the custom stats APIs.
I am not sure how we can add tests for RevalidateCachedQuery though using
pg_stat_statements. We could skip testing this scenario, maybe??
Perhaps. I'd need to think through this one. Let's do things in
order and see about the reports for the bind/execute messages, first,
please?
--
Michael
After sleeping on it, I'd tend to slightly favor the last option in
the back-branches and the second option on HEAD where we reduce the
number of report calls. This way, we are a bit more careful in
released branches by being more aggressive in reporting the query ID.
I agree with this because it will safely allow us to backpatch this
fix.
The tests in pg_stat_statements are one part I'm pretty sure is one
good way forward. It is not perfect, but with the psql meta-commands
I played around with BackgrounsPsql. It works and gives us more flexibility
in testing, but I think the pg_stat_statements test are good enough for this
purpose.
My only concern is this approach tests core functionality ( reporting of queryId )
in the tests of a contrib module ( pg_stat_statements ). Is that a valid
concern?
Perhaps. I'd need to think through this one. Let's do things in
order and see about the reports for the bind/execute messages, first,
please?
Sure, that is fine.
Regards,
Sami
On Wed, Sep 11, 2024 at 09:41:58PM -0500, Sami Imseih wrote:
The tests in pg_stat_statements are one part I'm pretty sure is one
good way forward. It is not perfect, but with the psql meta-commandsI played around with BackgrounsPsql. It works and gives us more flexibility
in testing, but I think the pg_stat_statements test are good enough for this
purpose.My only concern is this approach tests core functionality ( reporting of queryId )
in the tests of a contrib module ( pg_stat_statements ). Is that a valid
concern?
Do you think that we'd better replace the calls reporting the query ID
in execMain.c by some assertions on HEAD? This won't work for
ExecutorStart() because PREPARE statements (or actually EXECUTE,
e.g. I bumped on that yesterday but I don't recall which one) would
blow up on that with compute_query_id enabled. We could do something
like that in ExecutorRun() at least as that may be helpful for
extensions? An assertion would be like:
Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
ExecutorFinish() and ExecutorEnd() are not that mandatory, so there's
a risk that this causes the backend to complain because a planner or
post-analyze hook decides to force the hand of the backend entry in an
extension. With such checks, we're telling them to just not do that.
So your point would be to force this rule within the core executor on
HEAD? I would not object to that in case we're missing more spots
with the extended query protocol, actually. That would help us detect
cases where we're still missing the query ID to be set and the
executor should know about that. The execute/fetch has been missing
for years without us being able to detect it automatically.
Note that I'm not much worried about the dependency with
pg_stat_statements. We already rely on it for query jumbling
normalization for some parse node patterns like DISCARD, and query
jumbling requires query IDs to be around. So that's not new.
--
Michael
Do you think that we'd better replace the calls reporting the query ID
in execMain.c by some assertions on HEAD? This won't work for
ExecutorStart() because PREPARE statements (or actually EXECUTE,
e.g. I bumped on that yesterday but I don't recall which one) would
Yes, adding the asserts in execMain.c is better, but there is complications
there due to the issue you mention. I think the issue you are bumping into
is when pg_stat_statements.track_utility = on ( default ), the assert in
ExecutorStart will fail on EXECUTE. I believe it's because ( need to verify )
pg_stat_statements.c sets the queryId = 0 in the ProcessUtility hook [1]https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c#L1127-L1128.
So your point would be to force this rule within the core executor on
HEAD? I would not object to that in case we're missing more spots
with the extended query protocol, actually. That would help us detect
cases where we're still missing the query ID to be set and the
executor should know about that.
Yes, but looking at how pg_stat_statements works with PREPARE/EXECUTE,
I am now thinking it's better to Just keep the tests in pg_stat_statements.
Having test coverage in pg_stat_statements is better than nothing, and
check-world ( or similar ) will be able to cacth such failures.
Note that I'm not much worried about the dependency with
pg_stat_statements. We already rely on it for query jumbling
normalization for some parse node patterns like DISCARD, and query
jumbling requires query IDs to be around. So that's not new.
Good point.
Regards,
Sami
On Thu, Sep 12, 2024 at 03:58:27PM -0500, Sami Imseih wrote:
Yes, adding the asserts in execMain.c is better, but there is complications
there due to the issue you mention. I think the issue you are bumping into
is when pg_stat_statements.track_utility = on ( default ), the assert in
ExecutorStart will fail on EXECUTE. I believe it's because ( need to verify )
pg_stat_statements.c sets the queryId = 0 in the ProcessUtility hook [1].
Yes.
I am now thinking it's better to Just keep the tests in pg_stat_statements.
Having test coverage in pg_stat_statements is better than nothing, and
check-world ( or similar ) will be able to catch such failures.
I have begun things by applying a patch to add new tests in
pg_stat_statements. It is just something that is useful on its own,
and we had nothing of the kind.
Then, please see attached two lightly-updated patches. 0001 is for a
backpatch down to v14. This is yours to force things in the exec and
bind messages for all portal types, with the test (placed elsewhere in
14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
up the tests of pg_stat_statements if one is not careful with the
query ID reporting.
I'm planning to look at that again at the beginning of next week.
--
Michael
Attachments:
v6-0001-Add-missing-query-ID-reporting-in-extended-query-.patchtext/x-diff; charset=us-asciiDownload
From 6921a7b6e25c2471d64f05ce136af741598381c0 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 13 Sep 2024 14:54:17 +0900
Subject: [PATCH v6 1/2] Add missing query ID reporting in extended query
protocol
This commit adds query ID reports for two code paths for the extended
query protocol:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.
An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough:
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message.
Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it. Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter. The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.
The test added in pg_stat_statements is a courtesy of Robert Haas.
Author: Sami Imseih
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Backpatch-through: 14
---
src/backend/executor/execMain.c | 10 ++++----
src/backend/tcop/postgres.c | 24 +++++++++++++++++++
.../pg_stat_statements/expected/extended.out | 8 +++++++
contrib/pg_stat_statements/sql/extended.sql | 4 ++++
4 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186fa73..7042ca6c60 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,10 +119,12 @@ void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
- * In some cases (e.g. an EXECUTE statement) a query execution will skip
- * parse analysis, which means that the query_id won't be reported. Note
- * that it's harmless to report the query_id multiple times, as the call
- * will be ignored if the top level query_id has already been reported.
+ * In some cases (e.g. an EXECUTE statement or an execute message with the
+ * extended query protocol) the query_id won't be reported, so do it now.
+ *
+ * Note that it's harmless to report the query_id multiple times, as the
+ * call will be ignored if the top level query_id has already been
+ * reported.
*/
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..e394f1419a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1654,6 +1654,7 @@ exec_bind_message(StringInfo input_message)
char msec_str[32];
ParamsErrorCbData params_data;
ErrorContextCallback params_errcxt;
+ ListCell *lc;
/* Get the fixed part of the message */
portal_name = pq_getmsgstring(input_message);
@@ -1689,6 +1690,17 @@ exec_bind_message(StringInfo input_message)
pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ foreach(lc, psrc->query_list)
+ {
+ Query *query = lfirst_node(Query, lc);
+
+ if (query->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(query->queryId, false);
+ break;
+ }
+ }
+
set_ps_display("BIND");
if (save_log_statement_stats)
@@ -2111,6 +2123,7 @@ exec_execute_message(const char *portal_name, long max_rows)
ErrorContextCallback params_errcxt;
const char *cmdtagname;
size_t cmdtaglen;
+ ListCell *lc;
/* Adjust destination to tell printtup.c what to do */
dest = whereToSendOutput;
@@ -2157,6 +2170,17 @@ exec_execute_message(const char *portal_name, long max_rows)
pgstat_report_activity(STATE_RUNNING, sourceText);
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+
+ if (stmt->queryId != UINT64CONST(0))
+ {
+ pgstat_report_query_id(stmt->queryId, false);
+ break;
+ }
+ }
+
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);
set_ps_display_with_len(cmdtagname, cmdtaglen);
diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out
index bc8cb3f141..04a0594337 100644
--- a/contrib/pg_stat_statements/expected/extended.out
+++ b/contrib/pg_stat_statements/expected/extended.out
@@ -1,5 +1,13 @@
-- Tests with extended query protocol
SET pg_stat_statements.track_utility = FALSE;
+-- This test checks that an execute message sets a query ID.
+SELECT query_id IS NOT NULL AS query_id_set
+ FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g
+ query_id_set
+--------------
+ t
+(1 row)
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
t
---
diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql
index 5ba0678e63..1af0711020 100644
--- a/contrib/pg_stat_statements/sql/extended.sql
+++ b/contrib/pg_stat_statements/sql/extended.sql
@@ -2,6 +2,10 @@
SET pg_stat_statements.track_utility = FALSE;
+-- This test checks that an execute message sets a query ID.
+SELECT query_id IS NOT NULL AS query_id_set
+ FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g
+
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
SELECT $1 \parse stmt1
SELECT $1, $2 \parse stmt2
--
2.45.2
v6-0002-Add-sanity-checks-in-executor-for-query-ID.patchtext/x-diff; charset=us-asciiDownload
From b7f0dd8b23ae6a42ce94ee4c3856f9ff22cc8eb2 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 13 Sep 2024 14:56:18 +0900
Subject: [PATCH v6 2/2] Add sanity checks in executor for query ID
This causes the test added by the previous commit to blow up on sight,
as would a removal of the query ID reporting in ExecuteStart() blow up
when running the tests of pg_stat_statements.
No backpatck done here.
---
src/backend/executor/execMain.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7042ca6c60..77b8d4762d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -50,6 +50,7 @@
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/queryjumble.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
#include "tcop/utility.h"
@@ -295,6 +296,9 @@ ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count,
bool execute_once)
{
+ /* If enabled, the query ID should be set. */
+ Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+
if (ExecutorRun_hook)
(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
else
@@ -403,6 +407,9 @@ standard_ExecutorRun(QueryDesc *queryDesc,
void
ExecutorFinish(QueryDesc *queryDesc)
{
+ /* If enabled, the query ID should be set. */
+ Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+
if (ExecutorFinish_hook)
(*ExecutorFinish_hook) (queryDesc);
else
@@ -418,6 +425,9 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
/* sanity checks */
Assert(queryDesc != NULL);
+ /* If enabled, the query ID should be set. */
+ Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+
estate = queryDesc->estate;
Assert(estate != NULL);
--
2.45.2
Then, please see attached two lightly-updated patches. 0001 is for a
backpatch down to v14. This is yours to force things in the exec and
bind messages for all portal types, with the test (placed elsewhere in
14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
up the tests of pg_stat_statements if one is not careful with the
query ID reporting.
These 2 patches look good to me; except for the slight typo
In the commit message of 0002. "backpatch" instead of "backpatck".
That leaves us with considering v5-0002 [1]/messages/by-id/DB325894-3EE3-4B2E-A18C-4B34E7B2F5EC@gmail.com. I do think this is good
for overall correctness of the queryId being advertised after a cache
revalidation, even if users of pg_stat_activity will hardly notice this.
[1]: /messages/by-id/DB325894-3EE3-4B2E-A18C-4B34E7B2F5EC@gmail.com
Regards,
Sami
On Tue, Sep 17, 2024 at 05:01:18PM -0500, Sami Imseih wrote:
Then, please see attached two lightly-updated patches. 0001 is for a
backpatch down to v14. This is yours to force things in the exec and
bind messages for all portal types, with the test (placed elsewhere in
14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
up the tests of pg_stat_statements if one is not careful with the
query ID reporting.These 2 patches look good to me; except for the slight typo
In the commit message of 0002. "backpatch" instead of "backpatck".
Yes, I've noticed this one last Friday and fixed the typo in the
commit log after sending the previous patch series.
That leaves us with considering v5-0002 [1]. I do think this is good
for overall correctness of the queryId being advertised after a cache
revalidation, even if users of pg_stat_activity will hardly notice this.[1] /messages/by-id/DB325894-3EE3-4B2E-A18C-4B34E7B2F5EC@gmail.com
Yeah. I need more time to evaluate this one.
Also, please find one of the scripts I have used for the execute/fetch
case, that simply does an INSERT RETURNING with a small fetch size to
create a larger window in pg_stat_activity where we don't report the
query ID. One can run it like that, crude still on point:
# Download a JDBC driver
# Create the table to use.
psql -c 'create table aa (a int);' postgres
CLASSPATH=postgresql-42.7.4.jar java TestReturning.java
Then, while running the script, you would notice that pg_stat_activity
reports NULL for the query ID with the query text while the batch
fetches are processing. I've taken and expanded one of the scripts
you have sent for 1d477a907e63.
I'd like to get to the point where we are able to test that in core
reliably. The sanity checks in the executor paths are a good step
forward but they do nothing for the fetch cases. Perhaps Andrew
Dunstan work to expose libpq's APIs with the perl TAP tests would
help at some point to control the extended protocol queries, but we
are going to need more for the fetch case as there are no hooks that
would help to grab a query ID. A second option I have in mind would
be to set up an injection point that produces a NOTICE if a query ID
is set when we end processing an execute message, then check the
number of NOTICE messages produced as these can be predictible
depending on the number of rows and the fetch size.. This won't fly
far unless we can control the fetch size.
--
Michael
Attachments:
would help to grab a query ID. A second option I have in mind would
be to set up an injection point that produces a NOTICE if a query ID
is set when we end processing an execute message, then check the
number of NOTICE messages produced as these can be predictible
depending on the number of rows and the fetch size.. This won't fly
far unless we can control the fetch size.
FWIW, I do like the INJECTION_POINT idea and actually mentioned something
similar up the thread [1]/messages/by-id/465EECA3-D98C-4E46-BBDB-4D057617DD89@gmail.com for the revalidate cache case, but I can see it being applied
to all the other places we expect the queryId to be set.
[1]: /messages/by-id/465EECA3-D98C-4E46-BBDB-4D057617DD89@gmail.com
--
Sami
On Tue, Sep 17, 2024 at 06:39:17PM -0500, Sami Imseih wrote:
FWIW, I do like the INJECTION_POINT idea and actually mentioned something
similar up the thread [1] for the revalidate cache case, but I can see it being applied
to all the other places we expect the queryId to be set.[1] /messages/by-id/465EECA3-D98C-4E46-BBDB-4D057617DD89@gmail.com
FWIW, I was thinking about something like what has been done in
indexcmds.c for 5bbdfa8a18dc as the query ID value is not predictible
across releases, but we could see whether it is set or not.
--
Michael
On Wed, Sep 18, 2024 at 07:50:27AM +0900, Michael Paquier wrote:
On Tue, Sep 17, 2024 at 05:01:18PM -0500, Sami Imseih wrote:
Then, please see attached two lightly-updated patches. 0001 is for a
backpatch down to v14. This is yours to force things in the exec and
bind messages for all portal types, with the test (placed elsewhere in
14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
up the tests of pg_stat_statements if one is not careful with the
query ID reporting.These 2 patches look good to me; except for the slight typo
In the commit message of 0002. "backpatch" instead of "backpatck".Yes, I've noticed this one last Friday and fixed the typo in the
commit log after sending the previous patch series.
So, I have applied 0001 down to 14, followed by 0002 on HEAD.
For the sake of completeness, I have tested all the five
PortalStrategys with the extended query protocol and with the sanity
checks of 0002 in place and compute_query_id=regress to force the
computations, so I'd like to think that we are pretty good now.
0002 is going to be interesting to see moving forward. I am wondering
how existing out-of-core extensions will react on that and if it will
help catching up any issues. So, let's see how the experiment goes
with HEAD on this side. Perhaps we'll have to revert 0002 at the end,
or perhaps not...
--
Michael
On Wed, Sep 18, 2024 at 09:38:32AM +0900, Michael Paquier wrote:
FWIW, I was thinking about something like what has been done in
indexcmds.c for 5bbdfa8a18dc as the query ID value is not predictible
across releases, but we could see whether it is set or not.
By the way, with the main issue fixed as of 933848d16dc9, could it be
possible to deal with the plan cache part in a separate thread? This
is from the start a separate thread to me, and we've done quite a bit
here already.
--
Michael
So, I have applied 0001 down to 14, followed by 0002 on HEAD.
Thank you!
0002 is going to be interesting to see moving forward. I am wondering
how existing out-of-core extensions will react on that and if it will
help catching up any issues. So, let's see how the experiment goes
with HEAD on this side. Perhaps we'll have to revert 0002 at the end,
or perhaps not...
If an extension breaks because of this, then it's doing something wrong __
Let's see what happens.
--
Sami
By the way, with the main issue fixed as of 933848d16dc9, could it be
possible to deal with the plan cache part in a separate thread? This
is from the start a separate thread to me, and we've done quite a bit
here already.
Agree, will do start a new thread.
--
Sami
On Wed, Sep 18, 2024 at 03:14:07PM -0500, Sami Imseih wrote:
Agree, will do start a new thread.
Thanks.
--
Michael
Hello Michael and Sami,
18.09.2024 08:46, Michael Paquier wrote:
So, I have applied 0001 down to 14, followed by 0002 on HEAD.
Please look at the script, which triggers Assert added by 24f520594:
(assuming shared_preload_libraries=pg_stat_statements)
SELECT repeat('x', 100) INTO t FROM generate_series(1, 100000);
CREATE FUNCTION f() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
CREATE INDEX ON t(f());
TRAP: failed Assert("!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0"), File: "execMain.c", Line: 300, PID: 1288609
ExceptionalCondition at assert.c:52:13
ExecutorRun at execMain.c:302:6
postquel_getnext at functions.c:903:24
fmgr_sql at functions.c:1198:15
ExecInterpExpr at execExprInterp.c:746:8
ExecInterpExprStillValid at execExprInterp.c:2034:1
ExecEvalExprSwitchContext at executor.h:367:13
evaluate_expr at clauses.c:4997:14
evaluate_function at clauses.c:4505:1
simplify_function at clauses.c:4092:12
eval_const_expressions_mutator at clauses.c:2591:14
expression_tree_mutator_impl at nodeFuncs.c:3550:12
eval_const_expressions_mutator at clauses.c:3712:1
eval_const_expressions at clauses.c:2267:1
RelationGetIndexExpressions at relcache.c:5079:20
BuildIndexInfo at index.c:2426:7
...
Best regards,
Alexander
On Wed, Sep 25, 2024 at 05:00:00PM +0300, Alexander Lakhin wrote:
Please look at the script, which triggers Assert added by 24f520594:
(assuming shared_preload_libraries=pg_stat_statements)
Or just compute_query_id = on.
SELECT repeat('x', 100) INTO t FROM generate_series(1, 100000);
CREATE FUNCTION f() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
CREATE INDEX ON t(f());TRAP: failed Assert("!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0"), File: "execMain.c", Line: 300, PID: 1288609
ExceptionalCondition at assert.c:52:13
ExecutorRun at execMain.c:302:6
postquel_getnext at functions.c:903:24
fmgr_sql at functions.c:1198:15
ExecInterpExpr at execExprInterp.c:746:8
ExecInterpExprStillValid at execExprInterp.c:2034:1
ExecEvalExprSwitchContext at executor.h:367:13
And this assertion is doing the job I want it to do, because it is
telling us that we are not setting a query ID when doing a parallel
btree build. The query string that we would report at the beginning
of _bt_parallel_build_main() is passed down as a parameter, but not
the query ID. Hence pg_stat_activity would report a NULL query ID
when spawning parallel workers in this cases, even if there is a query
string.
The same can be said for the parallel build for BRIN, that uses a lot
of logic taken from btree for there parallel parameters, and even
vacuum, as it goes through a parse analyze where its query ID would be
set. but that's missed in the workers.
See _bt_parallel_build_main(), _brin_parallel_build_main() and
parallel_vacuum_main() which are the entry point used by the workers
for all of them. For BRIN, note that I can get the same failure with
the following query, based on the table of your previous test that
would spawn a worker:
CREATE INDEX foo ON t using brin(f());
The recovery test 027_stream_regress.pl not catching these failures
means that we don't have tests with an index expression for such
parallel builds, or the assertion would have triggered. It looks like
this is just because we don't do a parallel btree build with an index
expression where we need to go through the executor to build its
IndexInfo.
Note that parallel workers launched by execParallel.c pass down the
query ID in a minimal PlannedStmt where we use pgstat_get_my_query_id,
so let's do the same for all these.
Attached is the patch I am finishing with, with some new tests for
BRIN and btree to force parallel builds with immutable expressions
through functions. These fail the assertions in the recovery TAP
test. It may be a good idea to keep these tests in the long-term
anyway. It took me a few minutes to find out that
min_parallel_table_scan_size and max_parallel_maintenance_workers was
enough to force workers to spawn even if tables have no data, to make
the tests cheaper.
Thoughts or comments?
--
Michael
Attachments:
0001-Set-query-ID-in-parallel-workers-for-vacuum-brin-and.patchtext/x-diff; charset=us-asciiDownload
From 782147f8f8a22ed7acb51484568c8ac69425ee6b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 26 Sep 2024 10:01:36 +0900
Subject: [PATCH] Set query ID in parallel workers for vacuum, brin and btree
All these code paths use their own entry point when starting their
parallel workers, but fail to set their query IDs, even if they set a
text query. Hence, this data would be missed in pg_stat_activity when
spawning their parallel workers.
Some tests are added to show how the failures can happen, which are able
to trigger the assertion failure in the TAP test 027_stream_regress.pl
where pg_stat_statements is loaded.
---
src/backend/access/brin/brin.c | 5 +++++
src/backend/access/nbtree/nbtsort.c | 5 +++++
src/backend/commands/vacuumparallel.c | 11 ++++++++---
src/test/regress/expected/brin.out | 10 ++++++++++
src/test/regress/expected/btree_index.out | 10 ++++++++++
src/test/regress/sql/brin.sql | 11 +++++++++++
src/test/regress/sql/btree_index.sql | 11 +++++++++++
7 files changed, 60 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 60853a0f6a..cb2f497da1 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -66,6 +66,7 @@ typedef struct BrinShared
bool isconcurrent;
BlockNumber pagesPerRange;
int scantuplesortstates;
+ uint64 queryid;
/*
* workersdonecv is used to monitor the progress of workers. All parallel
@@ -2448,6 +2449,7 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
brinshared->isconcurrent = isconcurrent;
brinshared->scantuplesortstates = scantuplesortstates;
brinshared->pagesPerRange = buildstate->bs_pagesPerRange;
+ brinshared->queryid = pgstat_get_my_query_id();
ConditionVariableInit(&brinshared->workersdonecv);
SpinLockInit(&brinshared->mutex);
@@ -2891,6 +2893,9 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
indexLockmode = RowExclusiveLock;
}
+ /* Track query ID */
+ pgstat_report_query_id(brinshared->queryid, false);
+
/* Open relations within worker */
heapRel = table_open(brinshared->heaprelid, heapLockmode);
indexRel = index_open(brinshared->indexrelid, indexLockmode);
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index f5d7b3b0c3..4f1a54cffe 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -104,6 +104,7 @@ typedef struct BTShared
bool nulls_not_distinct;
bool isconcurrent;
int scantuplesortstates;
+ uint64 queryid;
/*
* workersdonecv is used to monitor the progress of workers. All parallel
@@ -1505,6 +1506,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btshared->nulls_not_distinct = btspool->nulls_not_distinct;
btshared->isconcurrent = isconcurrent;
btshared->scantuplesortstates = scantuplesortstates;
+ btshared->queryid = pgstat_get_my_query_id();
ConditionVariableInit(&btshared->workersdonecv);
SpinLockInit(&btshared->mutex);
/* Initialize mutable state */
@@ -1787,6 +1789,9 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
indexLockmode = RowExclusiveLock;
}
+ /* Track query ID */
+ pgstat_report_query_id(btshared->queryid, false);
+
/* Open relations within worker */
heapRel = table_open(btshared->heaprelid, heapLockmode);
indexRel = index_open(btshared->indexrelid, indexLockmode);
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 22c057fe61..4fd6574e12 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -57,12 +57,13 @@
typedef struct PVShared
{
/*
- * Target table relid and log level (for messages about parallel workers
- * launched during VACUUM VERBOSE). These fields are not modified during
- * the parallel vacuum.
+ * Target table relid, log level (for messages about parallel workers
+ * launched during VACUUM VERBOSE) and query ID. These fields are not
+ * modified during the parallel vacuum.
*/
Oid relid;
int elevel;
+ uint64 queryid;
/*
* Fields for both index vacuum and cleanup.
@@ -369,6 +370,7 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
MemSet(shared, 0, est_shared_len);
shared->relid = RelationGetRelid(rel);
shared->elevel = elevel;
+ shared->queryid = pgstat_get_my_query_id();
shared->maintenance_work_mem_worker =
(nindexes_mwm > 0) ?
maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
@@ -1014,6 +1016,9 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
debug_query_string = sharedquery;
pgstat_report_activity(STATE_RUNNING, debug_query_string);
+ /* Track query ID */
+ pgstat_report_query_id(shared->queryid, false);
+
/*
* Open table. The lock mode is the same as the leader process. It's
* okay because the lock mode does not conflict among the parallel
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index d6779d8c7d..e1db2280cf 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -567,6 +567,16 @@ SELECT * FROM brintest_3 WHERE b < '0';
DROP TABLE brintest_3;
RESET enable_seqscan;
+-- test parallel build with immutable function.
+CREATE TABLE brintest_expr (n int);
+CREATE FUNCTION brintest_func() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
+BEGIN;
+SET LOCAL min_parallel_table_scan_size = 0;
+SET LOCAL max_parallel_maintenance_workers = 4;
+CREATE INDEX brintest_expr_idx ON brintest_expr USING brin (brintest_func());
+COMMIT;
+DROP TABLE brintest_expr;
+DROP FUNCTION brintest_func();
-- test an unlogged table, mostly to get coverage of brinbuildempty
CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
diff --git a/src/test/regress/expected/btree_index.out b/src/test/regress/expected/btree_index.out
index b350efe128..d3f4c7e08c 100644
--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -476,6 +476,16 @@ INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM generate_series(1,1000) i;
-- Test unsupported btree opclass parameters
create index on btree_tall_tbl (id int4_ops(foo=1));
ERROR: operator class int4_ops has no options
+-- test parallel build with immutable function.
+CREATE TABLE btree_test_expr (n int);
+CREATE FUNCTION btree_test_func() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
+BEGIN;
+SET LOCAL min_parallel_table_scan_size = 0;
+SET LOCAL max_parallel_maintenance_workers = 4;
+CREATE INDEX btree_test_expr_idx ON btree_test_expr USING btree (btree_test_func());
+COMMIT;
+DROP TABLE btree_test_expr;
+DROP FUNCTION btree_test_func();
-- Test case of ALTER INDEX with abuse of column names for indexes.
-- This grammar is not officially supported, but the parser allows it.
CREATE INDEX btree_tall_idx2 ON btree_tall_tbl (id);
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index 695cfad4be..7ea97f47c8 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -510,6 +510,17 @@ SELECT * FROM brintest_3 WHERE b < '0';
DROP TABLE brintest_3;
RESET enable_seqscan;
+-- test parallel build with immutable function.
+CREATE TABLE brintest_expr (n int);
+CREATE FUNCTION brintest_func() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
+BEGIN;
+SET LOCAL min_parallel_table_scan_size = 0;
+SET LOCAL max_parallel_maintenance_workers = 4;
+CREATE INDEX brintest_expr_idx ON brintest_expr USING brin (brintest_func());
+COMMIT;
+DROP TABLE brintest_expr;
+DROP FUNCTION brintest_func();
+
-- test an unlogged table, mostly to get coverage of brinbuildempty
CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
diff --git a/src/test/regress/sql/btree_index.sql b/src/test/regress/sql/btree_index.sql
index 0d2a33f370..2c3b135292 100644
--- a/src/test/regress/sql/btree_index.sql
+++ b/src/test/regress/sql/btree_index.sql
@@ -272,6 +272,17 @@ INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM generate_series(1,1000) i;
-- Test unsupported btree opclass parameters
create index on btree_tall_tbl (id int4_ops(foo=1));
+-- test parallel build with immutable function.
+CREATE TABLE btree_test_expr (n int);
+CREATE FUNCTION btree_test_func() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
+BEGIN;
+SET LOCAL min_parallel_table_scan_size = 0;
+SET LOCAL max_parallel_maintenance_workers = 4;
+CREATE INDEX btree_test_expr_idx ON btree_test_expr USING btree (btree_test_func());
+COMMIT;
+DROP TABLE btree_test_expr;
+DROP FUNCTION btree_test_func();
+
-- Test case of ALTER INDEX with abuse of column names for indexes.
-- This grammar is not officially supported, but the parser allows it.
CREATE INDEX btree_tall_idx2 ON btree_tall_tbl (id);
--
2.45.2
Attached is the patch I am finishing with, with some new tests for
BRIN and btree to force parallel builds with immutable expressions
through functions.
glad to see the asserts are working as expected ad finding these issues.
I took a look at the patch and tested it. It looks good. My only concern
is the stability of using min_parallel_table_scan_size = 0. Will it always
guarantee parallel workers? Can we print some debugging that proves
a parallel worker was spun up?
Something like this I get with DEBUG1
postgres=*# CREATE INDEX btree_test_expr_idx ON btree_test_expr USING btree
(btree_test_func());
DEBUG: building index "btree_test_expr_idx" on table "btree_test_expr"
with request for 1 parallel workers
Also, we can just set the max_parallel_maintenance_workers to 1.
What do you think?
Regards,
Sami
DEBUG: building index "btree_test_expr_idx" on table "btree_test_expr"
with request for 1 parallel workers
On Wed, Sep 25, 2024 at 8:08 PM Michael Paquier <michael@paquier.xyz> wrote:
Show quoted text
On Wed, Sep 25, 2024 at 05:00:00PM +0300, Alexander Lakhin wrote:
Please look at the script, which triggers Assert added by 24f520594:
(assuming shared_preload_libraries=pg_stat_statements)Or just compute_query_id = on.
SELECT repeat('x', 100) INTO t FROM generate_series(1, 100000);
CREATE FUNCTION f() RETURNS int LANGUAGE sql IMMUTABLE RETURN 0;
CREATE INDEX ON t(f());TRAP: failed Assert("!IsQueryIdEnabled() || pgstat_get_my_query_id() !=
0"), File: "execMain.c", Line: 300, PID: 1288609
ExceptionalCondition at assert.c:52:13
ExecutorRun at execMain.c:302:6
postquel_getnext at functions.c:903:24
fmgr_sql at functions.c:1198:15
ExecInterpExpr at execExprInterp.c:746:8
ExecInterpExprStillValid at execExprInterp.c:2034:1
ExecEvalExprSwitchContext at executor.h:367:13And this assertion is doing the job I want it to do, because it is
telling us that we are not setting a query ID when doing a parallel
btree build. The query string that we would report at the beginning
of _bt_parallel_build_main() is passed down as a parameter, but not
the query ID. Hence pg_stat_activity would report a NULL query ID
when spawning parallel workers in this cases, even if there is a query
string.The same can be said for the parallel build for BRIN, that uses a lot
of logic taken from btree for there parallel parameters, and even
vacuum, as it goes through a parse analyze where its query ID would be
set. but that's missed in the workers.See _bt_parallel_build_main(), _brin_parallel_build_main() and
parallel_vacuum_main() which are the entry point used by the workers
for all of them. For BRIN, note that I can get the same failure with
the following query, based on the table of your previous test that
would spawn a worker:
CREATE INDEX foo ON t using brin(f());The recovery test 027_stream_regress.pl not catching these failures
means that we don't have tests with an index expression for such
parallel builds, or the assertion would have triggered. It looks like
this is just because we don't do a parallel btree build with an index
expression where we need to go through the executor to build its
IndexInfo.Note that parallel workers launched by execParallel.c pass down the
query ID in a minimal PlannedStmt where we use pgstat_get_my_query_id,
so let's do the same for all these.Attached is the patch I am finishing with, with some new tests for
BRIN and btree to force parallel builds with immutable expressions
through functions. These fail the assertions in the recovery TAP
test. It may be a good idea to keep these tests in the long-term
anyway. It took me a few minutes to find out that
min_parallel_table_scan_size and max_parallel_maintenance_workers was
enough to force workers to spawn even if tables have no data, to make
the tests cheaper.Thoughts or comments?
--
Michael
Attached is the patch I am finishing with, with some new tests for
BRIN and btree to force parallel builds with immutable expressions
through functions.
Sorry about my last reply. Not sure what happened with my email client.
Here it is again.
glad to see the asserts are working as expected ad finding these issues.
I took a look at the patch and tested it. It looks good. My only concern
is the stability of using min_parallel_table_scan_size = 0. Will it always
guarantee parallel workers? Can we print some debugging that proves
a parallel worker was spun up?
Something like this I get with DEBUG1
DEBUG: building index "btree_test_expr_idx" on table "btree_test_expr" with request for 1 parallel workers
What do you think?
Regards,
Sami
On Thu, Sep 26, 2024 at 10:55:37PM +0000, Imseih (AWS), Sami wrote:
Sorry about my last reply. Not sure what happened with my email client.
Here it is again.
No worries.
glad to see the asserts are working as expected ad finding these issues.
I took a look at the patch and tested it. It looks good. My only concern
is the stability of using min_parallel_table_scan_size = 0. Will it always
guarantee parallel workers? Can we print some debugging that proves
a parallel worker was spun up?Something like this I get with DEBUG1
DEBUG: building index "btree_test_expr_idx" on table "btree_test_expr" with request for 1 parallel workers
What do you think?
I am not sure. The GUCs pretty much enforce this behavior and I doubt
that these are going to break moving on. Of course they would, but we
are usually careful enough about that as long as it is possible to
grep for them. For example see the BRIN case in pageinspect.
The usual method for output that we use to confirm parallelism would
be EXPLAIN. Perhaps a potential target for CREATE INDEX now that it
supports more modes? I don't know if that's worth it, just throwing
one idea in the bucket of ideas.
--
Michael
I am not sure. The GUCs pretty much enforce this behavior and I doubt
that these are going to break moving on. Of course they would, but we
are usually careful enough about that as long as it is possible to
grep for them. For example see the BRIN case in pageinspect.
Yes, I see pageinspect does the same thing for the BRIN case.
That is probably OK for this case also.
The usual method for output that we use to confirm parallelism would
be EXPLAIN. Perhaps a potential target for CREATE INDEX now that it
supports more modes? I don't know if that's worth it, just throwing
one idea in the bucket of ideas.
Not sure about EXPLAIN for CREATE INDEX, since it's not a plannable
statement.
Maybe a CREATE INDEX VERBOSE, just Like ANALYZE VERBOSE,
VACUUM VERBOSE, etc. This will show the step that an index
build is on (CONCURRENTLY has many steps), and can also show
if parallel workers are launched for the index build.
--
Sami
On Thu, Sep 26, 2024 at 11:01:12PM -0500, Sami Imseih wrote:
I am not sure. The GUCs pretty much enforce this behavior and I doubt
that these are going to break moving on. Of course they would, but we
are usually careful enough about that as long as it is possible to
grep for them. For example see the BRIN case in pageinspect.Yes, I see pageinspect does the same thing for the BRIN case.
That is probably OK for this case also.
Okay, I've applied this part then to fix the query ID reporting
for these parallel workers. If people would like a backpatch, please
let me know.
While thinking more about the assertion check in the executor over the
weekend, I've found two things that are not right about it, as of:
- It is OK to not set the query ID if we don't have a query string to
map to. This is something that came up to me because of the parallel
VACUUM case, the query string given to the parallel workers is
optional because we don't have a query string in the case of
autovacuum. This is not an issue currently because autovacuum does
not support parallel jobs (see "tab->at_params.nworkers = -1" in
autovacuum.c), but if we support parallel jobs in autovacuum at some
point the assertion would fail. BRIN and btree always expect a query
string, AFAIK.
- The GUC track_activities. We don't really test it in any tests and
it is enabled by default, so that's really easy to miss. I have been
able to trigger an assertion failure with something like that:
SET compute_query_id = on;
SET track_activities = off;
SELECT 1;
The first point is just some prevention for the future. The second
case is something we should fix and test. I am attaching a patch that
addresses both. Note that the test case cannot use a transaction
block as query IDs are only reported for the top queries, and we can
do a scan of pg_stat_activity to see if the query ID is set. The
assertion was getting more complicated, so I have hidden that behind a
macro in execMain.c. All that should complete this project.
Thoughts?
--
Michael
Attachments:
0001-Expand-assertion-check-for-query-ID-in-executor.patchtext/x-diff; charset=us-asciiDownload
From c6d5042a0ad0d77bac91300b5d5218552f536968 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 30 Sep 2024 10:06:32 +0900
Subject: [PATCH] Expand assertion check for query ID in executor
track_activities and debug_query_strind problems. Blah.
---
src/backend/executor/execMain.c | 18 +++++++++++++++---
src/test/regress/expected/guc.out | 21 +++++++++++++++++++++
src/test/regress/sql/guc.sql | 12 ++++++++++++
3 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 728cdee480..85f697aaa1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -71,6 +71,18 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
/* Hook for plugin to get control in ExecCheckPermissions() */
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
+/*
+ * Check that the query ID is set, which is something that happens only
+ * if compute_query_id is enabled (or a module forced it), if track_activities
+ * is enabled, and if a client provided a query string to map with the query
+ * ID set.
+ */
+#define EXEC_CHECK_QUERY_ID \
+do { \
+ Assert(!IsQueryIdEnabled() || !pgstat_track_activities || \
+ !debug_query_string || pgstat_get_my_query_id() != 0); \
+} while(0)
+
/* decls for local routines only used within this module */
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
@@ -297,7 +309,7 @@ ExecutorRun(QueryDesc *queryDesc,
bool execute_once)
{
/* If enabled, the query ID should be set. */
- Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+ EXEC_CHECK_QUERY_ID;
if (ExecutorRun_hook)
(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
@@ -408,7 +420,7 @@ void
ExecutorFinish(QueryDesc *queryDesc)
{
/* If enabled, the query ID should be set. */
- Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+ EXEC_CHECK_QUERY_ID;
if (ExecutorFinish_hook)
(*ExecutorFinish_hook) (queryDesc);
@@ -471,7 +483,7 @@ void
ExecutorEnd(QueryDesc *queryDesc)
{
/* If enabled, the query ID should be set. */
- Assert(!IsQueryIdEnabled() || pgstat_get_my_query_id() != 0);
+ EXEC_CHECK_QUERY_ID;
if (ExecutorEnd_hook)
(*ExecutorEnd_hook) (queryDesc);
diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out
index 455b6d6c0c..14edb42704 100644
--- a/src/test/regress/expected/guc.out
+++ b/src/test/regress/expected/guc.out
@@ -824,6 +824,27 @@ set default_with_oids to f;
-- Should not allow to set it to true.
set default_with_oids to t;
ERROR: tables declared WITH OIDS are not supported
+-- Test that disabling track_activities disables query ID reporting in
+-- pg_stat_activity.
+SET compute_query_id = on;
+SET track_activities = on;
+SELECT query_id IS NOT NULL AS qid_set FROM pg_stat_activity
+ WHERE pid = pg_backend_pid();
+ qid_set
+---------
+ t
+(1 row)
+
+SET track_activities = off;
+SELECT query_id IS NOT NULL AS qid_set FROM pg_stat_activity
+ WHERE pid = pg_backend_pid();
+ qid_set
+---------
+ f
+(1 row)
+
+RESET track_activities;
+RESET compute_query_id;
-- Test GUC categories and flag patterns
SELECT pg_settings_get_flags(NULL);
pg_settings_get_flags
diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql
index dc79761955..2be7ab2940 100644
--- a/src/test/regress/sql/guc.sql
+++ b/src/test/regress/sql/guc.sql
@@ -319,6 +319,18 @@ set default_with_oids to f;
-- Should not allow to set it to true.
set default_with_oids to t;
+-- Test that disabling track_activities disables query ID reporting in
+-- pg_stat_activity.
+SET compute_query_id = on;
+SET track_activities = on;
+SELECT query_id IS NOT NULL AS qid_set FROM pg_stat_activity
+ WHERE pid = pg_backend_pid();
+SET track_activities = off;
+SELECT query_id IS NOT NULL AS qid_set FROM pg_stat_activity
+ WHERE pid = pg_backend_pid();
+RESET track_activities;
+RESET compute_query_id;
+
-- Test GUC categories and flag patterns
SELECT pg_settings_get_flags(NULL);
SELECT pg_settings_get_flags('does_not_exist');
--
2.45.2
On Mon, Sep 30, 2024 at 10:07:55AM +0900, Michael Paquier wrote:
The first point is just some prevention for the future. The second
case is something we should fix and test. I am attaching a patch that
addresses both. Note that the test case cannot use a transaction
block as query IDs are only reported for the top queries, and we can
do a scan of pg_stat_activity to see if the query ID is set. The
assertion was getting more complicated, so I have hidden that behind a
macro in execMain.c. All that should complete this project.
And done this part.
While looking at the interactions between query ID and debug_string,
I've bumped into something that could be a fun project for a
contributor. Will post that in a bit, that may interest some.
--
Michael
Hello Michael,
01.10.2024 05:07, Michael Paquier wrote:
On Mon, Sep 30, 2024 at 10:07:55AM +0900, Michael Paquier wrote:
...
And done this part.
If I'm not missing something, all the patches discussed here are committed
now, so maybe I've encountered a new anomaly.
Please try the following script:
BEGIN;
PREPARE s AS SELECT 1;
SELECT $1 \bind 1 \g
EXECUTE s;
It produces for me:
TRAP: failed Assert("!IsQueryIdEnabled() || !pgstat_track_activities || !debug_query_string || pgstat_get_my_query_id()
!= 0"), File: "execMain.c", Line: 423, PID: 1296466
Best regards,
Alexander
On Tue, Oct 01, 2024 at 10:00:00PM +0300, Alexander Lakhin wrote:
Hello Michael,
01.10.2024 05:07, Michael Paquier wrote:
On Mon, Sep 30, 2024 at 10:07:55AM +0900, Michael Paquier wrote:
...
And done this part.If I'm not missing something, all the patches discussed here are committed
now, so maybe I've encountered a new anomaly.Please try the following script:
BEGIN;
PREPARE s AS SELECT 1;
SELECT $1 \bind 1 \g
EXECUTE s;It produces for me:
TRAP: failed Assert("!IsQueryIdEnabled() || !pgstat_track_activities ||
!debug_query_string || pgstat_get_my_query_id() != 0"), File: "execMain.c",
Line: 423, PID: 1296466
The failure would appear only with pg_stat_statements loaded, not with
compute_query_id enabled while pgss is not loaded. The difference is
explained by pgss_post_parse_analyze() where the query ID is reset for
an ExecuteStmt.
The \bind followed by the EXECUTE is at fault here. I thought first
that this was some manipulation related to unnamed portals, because
PREPARE followed by EXECUTE would assign the ID from the PREPARE in
the EXECUTE command when it reaches ExecutorFinish(). However,
\bind_named is able to show the same problem, like:
SELECT 2 \parse stmt1
begin;
PREPARE s AS SELECT 1;
\bind_named stmt1 \g
EXECUTE s; -- query ID 0
And that's when I noticed that this is only caused by the fact that we
would go through ExecuteEnd() and ExecuteFinish() while doing a
cleanup of the portal created for \bind[_named]. The query ID is
cleaned up first, then the executor end/finish paths are called. This
requires also pg_stat_statements.track_utility to be enabled and a
transaction block.
This proves that the two assertions within ExecutorFinish() and
ExecutorEnd() are a bad idea, as they depend on the code paths where
an active portal is going to be removed. That leaves the one in
ExecutorRun(), lowering the whole value of the check quite a bit. So
perhaps it is just better to let that go entirely and finish this
experiment?
Alexander, I've thought about a couple of fancy cases for
ExecutorRun() but I could not break it. Perhaps you have something in
your sleeve that would also break this case?
--
Michael
Hello Michael,
02.10.2024 03:52, Michael Paquier wrote:
Alexander, I've thought about a couple of fancy cases for
ExecutorRun() but I could not break it. Perhaps you have something in
your sleeve that would also break this case?
--
Fortunately, it's still pretty warm here, so I'm wearing T-shirt and my
sleeve isn't long enough for that, but if you gave me 2-3 days, I would
focus on researching this area...
Best regards,
Alexander
On Wed, Oct 02, 2024 at 06:00:00AM +0300, Alexander Lakhin wrote:
Fortunately, it's still pretty warm here, so I'm wearing T-shirt and my
sleeve isn't long enough for that, but if you gave me 2-3 days, I would
focus on researching this area...
Sure, thanks. I am also spending a few days thinking about patterns
around that before doing anything. The buildfarm is green, so there
is some flebxibility.
--
Michael
Hello Michael,
02.10.2024 06:29, Michael Paquier wrote:
On Wed, Oct 02, 2024 at 06:00:00AM +0300, Alexander Lakhin wrote:
Fortunately, it's still pretty warm here, so I'm wearing T-shirt and my
sleeve isn't long enough for that, but if you gave me 2-3 days, I would
focus on researching this area...Sure, thanks. I am also spending a few days thinking about patterns
around that before doing anything. The buildfarm is green, so there
is some flebxibility.
I've managed to falsify the Assert in ExecutorRun() with the following:
SET compute_query_id = 'off';
SET track_activities = 'off';
CREATE PROCEDURE p1()
LANGUAGE plpgsql
AS $$
BEGIN
PERFORM 1;
END;
$$;
CREATE PROCEDURE p2(x int)
LANGUAGE plpgsql
AS $$
BEGIN
IF x = 1 THEN
SET track_activities = 'on';
SET compute_query_id = 'on';
END IF;
CALL p1();
END;
$$;
CALL p2(0);
CALL p2(1);
TRAP: failed Assert("!IsQueryIdEnabled() || !pgstat_track_activities || !debug_query_string || pgstat_get_my_query_id()
!= 0"), File: "execMain.c", Line: 312, PID: 3765791
Best regards,
Alexander
On Thu, Oct 03, 2024 at 10:00:00PM +0300, Alexander Lakhin wrote:
I've managed to falsify the Assert in ExecutorRun() with the following:
SET compute_query_id = 'off';
SET track_activities = 'off';CALL p2(0);
CALL p2(1);
Fun. This manipulates the session to force the state of the first two
conditions, while the PERFORM done at PL-level resets the query ID
and while there is a query string from the top-level CALL. I'll go
revert the assertion for now.
--
Michael