query_id, pg_stat_activity, extended query protocol

Started by kaido vaiklaover 2 years ago69 messages
Jump to latest
#1kaido vaikla
kaido.vaikla@gmail.com

I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out that reason
can be in extended query protocol (
/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.

br
Kaido

#2Michael Paquier
michael@paquier.xyz
In reply to: kaido vaikla (#1)
Re: query_id, pg_stat_activity, extended query protocol

On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:

I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out that reason
can be in extended query protocol (
/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org
)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.

Well, you could say a bit of both, I guess. The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
--
Michael

#3kaido vaikla
kaido.vaikla@gmail.com
In reply to: Michael Paquier (#2)
Re: query_id, pg_stat_activity, extended query protocol

Thnx.
br
Kaido

On Tue, 13 Jun 2023 at 03:16, Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Mon, Jun 12, 2023 at 09:03:24PM +0300, kaido vaikla wrote:

I have noticed, if query comes from PostgreSQL JDBC Driver, then query_id
is not present in pg_stat_activity. Erik Wienhold figured out that

reason

can be in extended query protocol (

/messages/by-id/1391613709.939460.1684777418070@office.mailbox.org

)
My question is, is it expected or is it a bug: if extended query protocol
then no query_id in pg_stat_activity for running query.

Well, you could say a bit of both, I guess. The query ID is compiled
and stored in backend entries only after parse analysis, which is not
something that would happen when using the execution phase of the
extended query protocol, though it should be possible to access to the
Query nodes in the cached plans and their assigned query IDs.

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.
--
Michael

#4Dave Cramer
pg@fastcrypt.com
In reply to: kaido vaikla (#3)
Re: query_id, pg_stat_activity, extended query protocol

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.

This just cropped up as a pgjdbc github issue. Seems like something that
should be addressed.

Dave

#5Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#2)
Re: query_id, pg_stat_activity, extended query protocol

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.

I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.

/*
* If a new query is started, we reset the query identifier as it'll only
* be known after parse analysis, to avoid reporting last query's
* identifier.
*/
if (state == STATE_RUNNING)
beentry->st_query_id = UINT64CONST(0);

So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
        debug_query_string = psrc->query_string;

pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);

set_ps_display("BIND");

@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;

pgstat_report_activity(STATE_RUNNING, sourceText);
+ pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);

cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);

thoughts?

Regards,

Sami Imseih
Amazon Web Services (AWS)

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#5)
Re: query_id, pg_stat_activity, extended query protocol

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.

I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.

/*
* If a new query is started, we reset the query identifier as it'll only
* be known after parse analysis, to avoid reporting last query's
* identifier.
*/
if (state == STATE_RUNNING)
beentry->st_query_id = UINT64CONST(0);

So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
debug_query_string = psrc->query_string;

pgstat_report_activity(STATE_RUNNING, psrc->query_string);
+ pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);

set_ps_display("BIND");

@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;

pgstat_report_activity(STATE_RUNNING, sourceText);
+ pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);

cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);

thoughts?

In exec_bind_message, how can you be sure that queryId exists in
query_list before the call of GetCachedPlan(), which will validate and
lock the plan? What if some OIDs were altered in the middle?

--
regards, Andrei Lepikhov

#7Michael Paquier
michael@paquier.xyz
In reply to: Andrei Lepikhov (#6)
Re: query_id, pg_stat_activity, extended query protocol

On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

+       pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);
set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;
pgstat_report_activity(STATE_RUNNING, sourceText);
+       pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);

In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?

I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

Did you consider using \bind to show how this behaves in a regression
test?
--
Michael

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Michael Paquier (#7)
Re: query_id, pg_stat_activity, extended query protocol

On 4/23/24 12:49, Michael Paquier wrote:

On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

+       pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true);
set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)
debug_query_string = sourceText;
pgstat_report_activity(STATE_RUNNING, sourceText);
+       pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);

In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?

I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

Did you consider using \bind to show how this behaves in a regression
test?

I'm not sure how to invent a test based on the \bind command - we need
some pause in the middle.
But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:

CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

/*
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on public.test (actual time=0.002..0.004 rows=0 loops=1)
Query Identifier: 6750745711909650694

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on public.test (actual time=0.004..0.004 rows=0 loops=1)
Query Identifier: -2597546769858730762
*/

We have different objects which can be changed - I just have invented
the most trivial example to discuss.

--
regards, Andrei Lepikhov

#9Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#7)
Re: query_id, pg_stat_activity, extended query protocol

I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

IIUC, the query trees returned from QueryRewrite
will all have the same queryId, so it appears valid to
use the queryId from the first tree in the list. Right?

Here is an example I was working with that includes user-defined rules
that has a list with more than 1 tree.

postgres=# explain (verbose, generic_plan) insert into mytab values ($1) RETURNING pg_sleep($1), id ;
QUERY PLAN
-----------------------------------------------------------
Insert on public.mytab (cost=0.00..0.01 rows=1 width=4)
Output: pg_sleep(($1)::double precision), mytab.id
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425

Insert on public.mytab2 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425

Insert on public.mytab3 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425

Insert on public.mytab4 (cost=0.00..0.01 rows=0 width=0)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: $1
Query Identifier: 3703848357297795425
(20 rows)

Did you consider using \bind to show how this behaves in a regression
test?

Yes, this is precisely how I tested. Without the patch, I could not
see a queryId after 9 seconds of a pg_sleep, but with the patch it
appears. See the test below.

## test query
select pg_sleep($1) \bind 30

## unpatched
postgres=# select
query_id,
query,
now()-query_start query_duration,
state
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id | query | query_duration | state
----------+----------------------+-----------------+--------
| select pg_sleep($1) +| 00:00:08.604845 | active
| ; | |
(1 row)

## patched

postgres=# truncate table large;^C
postgres=# select
query_id,
query,
now()-query_start query_duration,
state
from pg_stat_activity where pid <> pg_backend_pid()
and state = 'active';
query_id | query | query_duration | state
---------------------+----------------------+----------------+--------
2433215470630378210 | select pg_sleep($1) +| 00:00:09.6881 | active
| ; | |
(1 row)

For exec_execute_message, I realized that to report queryId for
Utility and non-utility statements, we need to report the queryId
inside the portal routines where PlannedStmt contains the queryId.

Attached is the first real attempt at the fix.

Regards,

Sami

Attachments:

0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patchapplication/octet-stream; name=0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patchDownload+12-1
#10Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#8)
Re: query_id, pg_stat_activity, extended query protocol

But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:

CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

Regards,

Sami

#11David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#10)
Re: query_id, pg_stat_activity, extended query protocol

On Sat, Apr 27, 2024 at 6:55 AM Imseih (AWS), Sami <simseih@amazon.com>
wrote:

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

We choose a arguably more user-friendly option:

https://www.postgresql.org/docs/current/sql-prepare.html

"""
Although the main point of a prepared statement is to avoid repeated parse
analysis and planning of the statement, PostgreSQL will force re-analysis
and re-planning of the statement before using it whenever database objects
used in the statement have undergone definitional (DDL) changes or their
planner statistics have been updated since the previous use of the prepared
statement.
"""

David J.

#12Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#11)
Re: query_id, pg_stat_activity, extended query protocol

We choose a arguably more user-friendly option:

https://www.postgresql.org/docs/current/sql-prepare.html

Thanks for pointing this out!

Regards,

Sami

#13Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#10)
Re: query_id, pg_stat_activity, extended query protocol

But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:

CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

I tested v1 thoroughly.

Using the attached JDBC script for testing, I added some logging of the queryId
being reported by the patch and added a breakpoint after sync [1]https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877 which at that
point the locks are released on the table. I then proceeded to drop and recreate the table
and observed that the first bind after recreating the table still reports the
old queryId but the execute reports the correct queryId. This is because
the bind still has not had a chance to re-parse and re-plan after the
cache invalidation.

2024-04-27 13:51:15.757 CDT [43483] LOG: duration: 21322.475 ms execute S_1: select pg_sleep(10)
2024-04-27 13:51:21.591 CDT [43483] LOG: duration: 0.834 ms parse S_2: select from tab1 where id = $1
2024-04-27 13:51:21.591 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG: duration: 0.729 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:21.592 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:21.592 CDT [43483] LOG: duration: 0.032 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:32.501 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG: duration: 0.342 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:32.502 CDT [43483] LOG: query_id = -192969736922694368
2024-04-27 13:51:32.502 CDT [43483] LOG: duration: 0.067 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:42.613 CDT [43526] LOG: query_id = -4766379021163149612
-- recreate the tables
2024-04-27 13:51:42.621 CDT [43526] LOG: duration: 8.488 ms statement: drop table if exists tab1;
2024-04-27 13:51:42.621 CDT [43526] LOG: query_id = 7875284141628316369
2024-04-27 13:51:42.625 CDT [43526] LOG: duration: 3.364 ms statement: create table tab1 ( id int );
2024-04-27 13:51:42.625 CDT [43526] LOG: query_id = 2967282624086800441
2024-04-27 13:51:42.626 CDT [43526] LOG: duration: 0.936 ms statement: insert into tab1 values (1);

-- this reports the old query_id
2024-04-27 13:51:45.058 CDT [43483] LOG: query_id = -192969736922694368

2024-04-27 13:51:45.059 CDT [43483] LOG: duration: 0.913 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:45.059 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:45.059 CDT [43483] LOG: duration: 0.096 ms execute S_2: select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG: duration: 0.108 ms bind S_2: select from tab1 where id = $1
2024-04-27 13:51:46.777 CDT [43483] LOG: query_id = 3010297048333693297
2024-04-27 13:51:46.777 CDT [43483] LOG: duration: 0.024 ms execute S_2: select from tab1 where id = $1

The easy answer is to not report queryId during the bind message, but I will look
at what else can be done here as it's good to have a queryId reported in this scenario
for cases there are long planning times and we rather not have those missed in
pg_stat_activity sampling.

[1]: https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L4877

Regards,

Sami

Attachments:

Test.javaapplication/octet-stream; name=Test.javaDownload
#14Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#10)
Re: query_id, pg_stat_activity, extended query protocol

On 27/4/2024 20:54, Imseih (AWS), Sami wrote:

But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:

CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?

I don't think so. It may be any object, even stored procedure, that can
be changed. IMO, the right option here is to report zero (like the
undefined value of queryId) until the end of the parsing stage.

--
regards, Andrei Lepikhov

#15Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#14)
Re: query_id, pg_stat_activity, extended query protocol

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1]/messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com in which the queryId may change
due to a cached query invalidation.

[1]: /messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com

Regards,

Sami

Attachments:

0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patchapplication/octet-stream; name=0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patchDownload+20-1
#16Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#15)
Re: query_id, pg_stat_activity, extended query protocol

On 5/1/24 10:07, Imseih (AWS), Sami wrote:

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.

[1] /messages/by-id/724348C9-8023-41BC-895E-80634E79A538@amazon.com

I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.
I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.
However, the process of setting up the queryId is more complex. Should
we set it at the beginning of query execution? This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.
See in the attachment some sketch for that - it needs to add queryId
reset on abortion.

--
regards, Andrei Lepikhov

Attachments:

queryid_report.difftext/x-patch; charset=UTF-8; name=queryid_report.diffDownload+14-2
#17Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#16)
Re: query_id, pg_stat_activity, extended query protocol

I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.

In exec_parse_message, exec_bind_message and exec_execute_message,
the queryId is reset via pgstat_report_activity

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.

When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?

This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.

I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.

If my understanding is correct, I don't think that is a good idea. Tools that
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.

Regards,

Sami

#18Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#17)
Re: query_id, pg_stat_activity, extended query protocol

On Wed, May 15, 2024 at 03:24:05AM +0000, Imseih (AWS), Sami wrote:

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.

When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?

Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly. Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.

This seems logical, but
what about the planning process? If an extension plans a query without
the intention to execute it for speculative reasons, should we still
show the queryId? Perhaps we should reset the state right after planning
to accurately reflect the current queryId.

I think you are suggesting that during planning, the queryId
of the current statement being planned should not be reported.

If my understanding is correct, I don't think that is a good idea. Tools that
snasphot pg_stat_activity will not be able to account for the queryId during
planning. This could mean that certain load on the database cannot be tied
back to a specific queryId.

I'm -1 with the point of resetting the query ID based on what the
patch does, even if it remains available in the hooks.
pg_stat_activity is one thing, but you would also reduce the coverage
of log_line_prefix with %Q. And that can provide really useful
debugging information in the code paths where the query ID would be
reset as an effect of the proposed patch.

The patch to report the query ID of a planned query when running a
query through a PortalRunSelect() feels more intuitive in the
information it reports.
--
Michael

#19Andrei Lepikhov
lepihov@gmail.com
In reply to: Michael Paquier (#18)
Re: query_id, pg_stat_activity, extended query protocol

On 15/5/2024 12:09, Michael Paquier wrote:

On Wed, May 15, 2024 at 03:24:05AM +0000, Imseih (AWS), Sami wrote:

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.

When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?

Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly. Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.

Okay, that's what I precisely wanted to understand: queryId doesn't have
semantics to show the job that consumes resources right now—it is mostly
about convenience to know that the backend processes nothing except
(probably) this query.

--
regards, Andrei Lepikhov

#20Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#19)
Re: query_id, pg_stat_activity, extended query protocol

Okay, that's what I precisely wanted to understand: queryId doesn't have
semantics to show the job that consumes resources right now—it is mostly
about convenience to know that the backend processes nothing except
(probably) this query.

It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query.

I can see this being useful and perhaps could be taken up in a
separate thread.

Regards,

Sami

#21Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#20)
#22Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#17)
#23Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#22)
#24Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Sami Imseih (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#24)
#26Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#26)
#28Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#28)
#30Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#27)
#31Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#30)
#32jian he
jian.universality@gmail.com
In reply to: Sami Imseih (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: jian he (#32)
#34Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#31)
#35Andrei Lepikhov
lepihov@gmail.com
In reply to: Sami Imseih (#31)
#36Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#35)
#37Sami Imseih
samimseih@gmail.com
In reply to: Andrei Lepikhov (#34)
#38Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#37)
#39Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#36)
#40Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#33)
#41Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#40)
#42Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#41)
#43Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#42)
#44Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#43)
#45Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#44)
#46Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#45)
#47Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#46)
#48Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#47)
#49Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#49)
#51Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#48)
#52Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#50)
#53Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#51)
#54Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#52)
#55Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#54)
#56Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#51)
#57Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#56)
#58Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#57)
#59Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#58)
#60Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#59)
#61Sami Imseih
samimseih@gmail.com
In reply to: Michael Paquier (#60)
#62Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#61)
#63Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#62)
#64Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#63)
#65Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#64)
#66Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#65)
#67Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#66)
#68Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#67)
#69Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#68)