Display individual query in pg_stat_activity

Started by Bertrand Drouvotover 5 years ago18 messageshackers
Jump to latest
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com

Hi hackers,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are currently
displayed.

_Motivation:_

When multiple statements are displayed then we don’t know which one is
currently running.

For example:

psql -c "select pg_sleep(10);select pg_sleep(20);" is currently
displayed as:

postgres=# select backend_type,query from pg_stat_activity;
backend_type | query
------------------------------+--------------------------------------------------
client backend | select pg_sleep(10);select pg_sleep(20);

Showing which statement is currently being executed would be more helpful.

_Technical context and proposal:_

There is 2 points in this patch:

* modifying the current behavior in “exec_simple_query”
* modifying the current behavior in “ExecInitParallelPlan”

So that we could see for example:

backend_type | query
------------------------------+--------------------------------------------------
client backend | select pg_sleep(10);

and then

backend_type | query
------------------------------+--------------------------------------------------
client backend | select pg_sleep(20);

instead of the multiple sql statement described in the “motivation” section.

Another example: parallel worker being triggered while executing a function:

create or replace function test()
returns void as $$select count(/) as "first" from foo;select pg_sleep(10);select count(/) as "second" from foo;select pg_sleep(11);select pg_sleep(10)
$$
language sql;

We currently see:

backend_type | query
------------------------------+--------------------------------------------------------------------------------------------------------------------------------------
client backend | select test();
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+
|
parallel worker | select count(*) as "first" from foo;select pg_sleep(10);select count(*) as "second" from foo;select pg_sleep(11);select pg_sleep(10)+
|

while the attached patch would provide:

backend_type | query [217/1938]
------------------------------+--------------------------------------------------
client backend | select test();
parallel worker | select count(*) as "first" from foo;
parallel worker | select count(*) as "first" from foo;

and then:

backend_type | query
------------------------------+--------------------------------------------------
client backend | select test();
parallel worker | select count(*) as "second" from foo;
parallel worker | select count(*) as "second" from foo;

I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.

Regards,
Bertrand

Attachments:

v1-0001-pg_stat_activity_individual_query.patchtext/plain; charset=UTF-8; name=v1-0001-pg_stat_activity_individual_query.patch; x-mac-creator=0; x-mac-type=0Download+39-10
#2Dave Page
dpage@pgadmin.org
In reply to: Bertrand Drouvot (#1)
Re: Display individual query in pg_stat_activity

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:

Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity
query field when multiple SQL statements are currently displayed.

*Motivation:*

When multiple statements are displayed then we don’t know which one is
currently running.

I'm not sure I'd want that to happen, as it could make it much harder to
track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a
value to indicate what the current statement number in the query is?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

#3Jeremy Schneider
schnjere@amazon.com
In reply to: Dave Page (#2)
Re: Display individual query in pg_stat_activity

On 7/27/20 07:57, Dave Page wrote:

I'm not sure I'd want that to happen, as it could make it much harder
to track the activity back to a query in the application layer or
server logs. 

Perhaps a separate field could be added for the current statement, or
a value to indicate what the current statement number in the query is?

Might be helpful to give some specifics about circumstances where
strings can appear in pg_stat_activity.query with multiple statements.

1) First of all, IIUC multiple statements are only supported in the
first place by the simple protocol and PLs.  Anyone using parameterized
statements (bind variables) should be unaffected by this.

2) My read of the official pg JDBC driver is that even for batch
operations it currently iterates and sends each statement individually.
I don't think the JDBC driver has the capability to send multiple
statements, so java apps using this driver should be unaffected.

3) psql -c will always send the string as a single "simple protocol"
request.  Scripts will be impacted.

4) PLs also seem to have a code path that can put multiple statements in
pg_stat_activity when parallel slaves are launched.  PL code will be
impacted.

5) pgAdmin uses the simple protocol and when a user executes a block of
statements, pgAdmin seems to send the whole block as a single "simple
protocol" request.  Tools like pgAdmin will be impacted.

At the application layer, it doesn't seem problematic to me if
PostgreSQL reports each query one at a time.  IMO most people will find
this to be a more useful behavior and they will still find their queries
in their app code or app logs.

However at the PostgreSQL logging layer this is a good call-out.  I just
did a quick test on 14devel to double-check my assumption and it does
seem that PostgreSQL logs the entire combined query for psql -c.  I
think it would be better for PostgreSQL to report queries individually
in the log too - for example pgBadger summaries will be even more useful
if they report information for each individual query rather than a
single big block of multiple queries.

Given how small this patch is, it seems worthwhile to at least
investigate whether the logging component could be addressed just as easily.

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#4Dave Page
dpage@pgadmin.org
In reply to: Jeremy Schneider (#3)
Re: Display individual query in pg_stat_activity

On Mon, Jul 27, 2020 at 4:28 PM Jeremy Schneider <schnjere@amazon.com>
wrote:

On 7/27/20 07:57, Dave Page wrote:

I'm not sure I'd want that to happen, as it could make it much harder to
track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a
value to indicate what the current statement number in the query is?

Might be helpful to give some specifics about circumstances where strings
can appear in pg_stat_activity.query with multiple statements.

1) First of all, IIUC multiple statements are only supported in the first
place by the simple protocol and PLs. Anyone using parameterized
statements (bind variables) should be unaffected by this.

2) My read of the official pg JDBC driver is that even for batch
operations it currently iterates and sends each statement individually. I
don't think the JDBC driver has the capability to send multiple statements,
so java apps using this driver should be unaffected.

That is just one of a number of different popular drivers of course.

3) psql -c will always send the string as a single "simple protocol"
request. Scripts will be impacted.

4) PLs also seem to have a code path that can put multiple statements in
pg_stat_activity when parallel slaves are launched. PL code will be
impacted.

5) pgAdmin uses the simple protocol and when a user executes a block of
statements, pgAdmin seems to send the whole block as a single "simple
protocol" request. Tools like pgAdmin will be impacted.

It does. It also prepends some queries with comments, specifically to allow
users to filter them out when they're analysing logs (a feature requested
by users, not just something we thought was a good idea). I'm assuming that
this patch would also strip those?

At the application layer, it doesn't seem problematic to me if PostgreSQL
reports each query one at a time. IMO most people will find this to be a
more useful behavior and they will still find their queries in their app
code or app logs.

I think there are arguments to be made for both approaches.

However at the PostgreSQL logging layer this is a good call-out. I just
did a quick test on 14devel to double-check my assumption and it does seem
that PostgreSQL logs the entire combined query for psql -c. I think it
would be better for PostgreSQL to report queries individually in the log
too - for example pgBadger summaries will be even more useful if they
report information for each individual query rather than a single big block
of multiple queries.

Given how small this patch is, it seems worthwhile to at least investigate
whether the logging component could be addressed just as easily.

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Dave Page (#2)
Re: Display individual query in pg_stat_activity

On 7/27/20 9:57 AM, Dave Page wrote:

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com
<mailto:bdrouvot@amazon.com>> wrote:

<snip>

When multiple statements are displayed then we don’t know which
one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder
to track the activity back to a query in the application layer or
server logs.

Perhaps a separate field could be added for the current statement, or
a value to indicate what the current statement number in the query is?

Perhaps turn query into text[]. That would make it easy to concatenate
back together if desired.

Show quoted text

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Dave Page (#2)
Re: Display individual query in pg_stat_activity

Hi,

On 7/27/20 4:57 PM, Dave Page wrote:

*CAUTION*: This email originated from outside of the organization. Do
not click links or open attachments unless you can confirm the sender
and know the content is safe.

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com
<mailto:bdrouvot@amazon.com>> wrote:

Hi hackers,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

_Motivation:_

When multiple statements are displayed then we don’t know which
one is currently running.

I'm not sure I'd want that to happen, as it could make it much harder
to track the activity back to a query in the application layer or
server logs.

Perhaps a separate field could be added for the current statement, or
a value to indicate what the current statement number in the query is?

Thanks for he feedback.

I like the idea of adding extra information without changing the current
behavior.

A value to indicate what the current statement number is, would need
parsing the query field by the user to get the individual statement.

I think the separate field makes sense (though it come with an extra
memory price) as it will not change the existing behavior and would just
provide extra information (without any extra parsing needed for the user).

I attached a mock up v2 patch that adds this new field.

Outcome Examples:

  backend_type | query                                            |
individual_query
----------------+---------------------------------------------------------------------------------------------+----------------------
 client backend | select backend_type, query, individual_query from
pg_stat_activity where length(query) > 0; |
 client backend | select pg_sleep(10);select pg_sleep(20); | select
pg_sleep(20);

or

  backend_type | query |           individual_query
-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
 client backend  | select backend_type, query, individual_query from
pg_stat_activity where length(query) > 0; |
 client backend  | select test(); |
 parallel worker | select count(*) as "first" from foo;select
pg_sleep(10);create index bdtidx on foo(generate_series);select count(*)
as "second" from foo;select pg_sleep(11);select count(*) as "third" from
foo | select count(*) as "second" from foo;
 parallel worker | select count(*) as "first" from foo;select
pg_sleep(10);create index bdtidx on foo(generate_series);select count(*)
as "second" from foo;select pg_sleep(11);select count(*) as "third" from
foo | select count(*) as "second" from foo;

As you can see the individual_query field is populated only when the
query field is a multiple statements one.

Regards,

Bertrand

Show quoted text

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Attachments:

v2-0001-pg_stat_activity_individual_query.patchtext/plain; charset=UTF-8; name=v2-0001-pg_stat_activity_individual_query.patch; x-mac-creator=0; x-mac-type=0Download+230-94
#7Magnus Hagander
magnus@hagander.net
In reply to: Bertrand Drouvot (#6)
Re: Display individual query in pg_stat_activity

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:

Hi,
On 7/27/20 4:57 PM, Dave Page wrote:

*CAUTION*: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and know
the content is safe.

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:

Hi hackers,

I've attached a patch to display individual query in the pg_stat_activity
query field when multiple SQL statements are currently displayed.

*Motivation:*

When multiple statements are displayed then we don’t know which one is
currently running.

I'm not sure I'd want that to happen, as it could make it much harder to
track the activity back to a query in the application layer or server logs.

Perhaps a separate field could be added for the current statement, or a
value to indicate what the current statement number in the query is?

Thanks for he feedback.

I like the idea of adding extra information without changing the current
behavior.

A value to indicate what the current statement number is, would need
parsing the query field by the user to get the individual statement.

I think the separate field makes sense (though it come with an extra
memory price) as it will not change the existing behavior and would just
provide extra information (without any extra parsing needed for the user).

Idle though without having considered it too much -- you might reduce the
memory overhead by just storing a start/end offset into the combined query
string instead of a copy of the query. That way the cost would only be paid
when doing the reading of pg_stat_activity (by extracting the piece of the
string), which I'd argue is done orders of magnitude fewer times than the
query changes at least on busy systems. Care would have to be taken for the
case of the current executing query actually being entirely past the end of
the query string buffer of course, but I don't think that's too hard to
define a useful behaviour for. (The user interface would stay the same,
showing the actual string and thus not requiring the user to do any parsing)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Magnus Hagander (#7)
Re: Display individual query in pg_stat_activity

Hi,

On 8/6/20 12:24 PM, Magnus Hagander wrote:

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand <bdrouvot@amazon.com
<mailto:bdrouvot@amazon.com>> wrote:

Hi,

On 7/27/20 4:57 PM, Dave Page wrote:

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand
<bdrouvot@amazon.com <mailto:bdrouvot@amazon.com>> wrote:

Hi hackers,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

_Motivation:_

When multiple statements are displayed then we don’t know
which one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current
statement, or a value to indicate what the current statement
number in the query is?

Thanks for he feedback.

I like the idea of adding extra information without changing the
current behavior.

A value to indicate what the current statement number is, would
need parsing the query field by the user to get the individual
statement.

I think the separate field makes sense (though it come with an
extra memory price) as it will not change the existing behavior
and would just provide extra information (without any extra
parsing needed for the user).

Idle though without having considered it too much -- you might reduce
the memory overhead by just storing a start/end offset into the
combined query string instead of a copy of the query.

Good point, thanks for the feedback.

The new attached patch is making use of stmt_len and stmt_location
(instead of a copy of the query).

That way the cost would only be paid when doing the reading of
pg_stat_activity (by extracting the piece of the string), which I'd
argue is done orders of magnitude fewer times than the query changes
at least on busy systems.

The individual query extraction (making use of stmt_len and
stmt_location) has been moved to pg_stat_get_activity() in the new
attached patch (as opposed to pgstat_report_activity() in the previous
patch version).

Care would have to be taken for the case of the current executing
query actually being entirely past the end of the query string buffer
of course, but I don't think that's too hard to define a useful
behaviour for. (The user interface would stay the same, showing the
actual string and thus not requiring the user to do any parsing)

As a proposal the new attached patch does not display the individual
query if length + location is greater than
pgstat_track_activity_query_size (anyway it could not, as the query
field that might contain multiple statements is already <=
pgstat_track_activity_query_size in pg_stat_get_activity()).

Bertrand

Show quoted text

--
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

Attachments:

v2-0002-pg_stat_activity_individual_query.patchtext/plain; charset=UTF-8; name=v2-0002-pg_stat_activity_individual_query.patch; x-mac-creator=0; x-mac-type=0Download+169-93
#9Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Bertrand Drouvot (#8)
Re: Display individual query in pg_stat_activity

Hi,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which
one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current statement,
or a value to indicate what the current statement number in the
query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

--
Masahiro Ikeda
NTT DATA CORPORATION

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Masahiro Ikeda (#9)
Re: Display individual query in pg_stat_activity

Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com>
napsal:

Hi,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which
one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current statement,
or a value to indicate what the current statement number in the
query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

I am afraid of the significant performance impact of this feature. In this
case you have to copy all nested queries to the stat collector process.
Very common usage of PL is a glue of very fast queries. Sure, it is used
like glue for very slow queries too.

Just I thinking about two features:

1. extra interface for auto_explain, that allows you to get a stack of
statements assigned to some pid (probably these informations should be
stored inside shared memory and collected before any query execution).
Sometimes some slow function is slow due repeated execution of relatively
fast queries. In this case, the deeper nested level is not too interesting.
You need to see a stack of calls and you are searching the first slow level
in the stack.

2. can be nice to have a status column in pg_stat_activity, and status GUC
for sending a custom information from deep levels to the user. Now, users
use application_name, but some special variables can be better for this
purpose. This value of status can be refreshed periodically and can
substitute some tags. So developer can set

BEGIN
-- before slow long query
SET status TO 'slow query calculation xxy %d';
...

It is a alternative to RAISE NOTICE, but with different format - with
format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times of all
levels from the call stack to get valuable information.

Regards

Pavel

p.s. pg_stat_activity is maybe too wide table already, and probably is not
good to enhance this table too much

Show quoted text

--
Masahiro Ikeda
NTT DATA CORPORATION

#11Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Pavel Stehule (#10)
Re: Display individual query in pg_stat_activity

Hi,

On 8/18/20 9:35 AM, Pavel Stehule wrote:

Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
<ikedamsh@oss.nttdata.com <mailto:ikedamsh@oss.nttdata.com>> napsal:

Hi,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know which
one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current statement,
or a value to indicate what the current statement number in the
query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running
query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

Yeah I think it would be nice to have.

I also think it would be better to create a dedicated thread (specially
looking at Pavel's comment below)

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't it?

I am afraid of the significant performance impact of this feature. In
this case you have to copy all nested queries to the stat collector
process. Very common usage of PL is a glue of very fast queries. Sure,
it is used like glue for very slow queries too.

Just I thinking about two features:

1. extra interface for auto_explain, that allows you to get a stack of
statements assigned to some pid (probably these informations should be
stored inside shared memory and collected before any query execution).
Sometimes some slow function is slow due repeated execution of
relatively fast queries. In this case, the deeper nested level is not
too interesting. You need to see a stack of calls and you are
searching the first slow level in the stack.

2. can be nice to have a status column in pg_stat_activity, and status
GUC for sending a custom information from deep levels to the user.
Now, users use application_name, but some special variables can be
better for this purpose.  This value of status can be refreshed
periodically and can substitute some tags. So developer can set

BEGIN
  -- before slow long query
  SET status TO 'slow query calculation xxy %d';
 ...

It is a alternative to RAISE NOTICE, but with different format - with
format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times
of all levels from the call stack to get valuable information.

Regards

Pavel

p.s. pg_stat_activity is maybe too wide table already, and probably is
not good to enhance this table too much

Thanks

Bertrand

Show quoted text

--
Masahiro Ikeda
NTT DATA CORPORATION

#12Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Bertrand Drouvot (#11)
Re: Display individual query in pg_stat_activity

On 2020-08-19 14:48, Drouvot, Bertrand wrote:

Hi,
On 8/18/20 9:35 AM, Pavel Stehule wrote:

Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
<ikedamsh@oss.nttdata.com> napsal:

Hi,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know

which

one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current

statement,

or a value to indicate what the current statement number in the
query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running
query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

Yeah I think it would be nice to have.

I also think it would be better to create a dedicated thread
(specially looking at Pavel's comment below)

Thank you. I will.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't
it?

I am afraid of the significant performance impact of this feature.
In this case you have to copy all nested queries to the stat
collector process. Very common usage of PL is a glue of very fast
queries. Sure, it is used like glue for very slow queries too.
Just I thinking about two features:

OK, thanks for much advice and show alternative solutions.

1. extra interface for auto_explain, that allows you to get a stack
of statements assigned to some pid (probably these informations
should be stored inside shared memory and collected before any query
execution). Sometimes some slow function is slow due repeated
execution of relatively fast queries. In this case, the deeper
nested level is not too interesting. You need to see a stack of
calls and you are searching the first slow level in the stack.

Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.

2. can be nice to have a status column in pg_stat_activity, and
status GUC for sending a custom information from deep levels to the
user. Now, users use application_name, but some special variables
can be better for this purpose. This value of status can be
refreshed periodically and can substitute some tags. So developer
can set

BEGIN
-- before slow long query
SET status TO 'slow query calculation xxy %d';
...

It is a alternative to RAISE NOTICE, but with different format -
with format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times
of all levels from the call stack to get valuable information.

In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.

I worry which solution is best.

p.s. pg_stat_activity is maybe too wide table already, and probably
is not good to enhance this table too much

Thanks. I couldn't think from this point of view.

After I make some PoC patches, I will create a dedicated thread.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Masahiro Ikeda (#12)
Re: Display individual query in pg_stat_activity

pá 28. 8. 2020 v 10:06 odesílatel Masahiro Ikeda <ikedamsh@oss.nttdata.com>
napsal:

On 2020-08-19 14:48, Drouvot, Bertrand wrote:

Hi,
On 8/18/20 9:35 AM, Pavel Stehule wrote:

Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
<ikedamsh@oss.nttdata.com> napsal:

Hi,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know

which

one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current

statement,

or a value to indicate what the current statement number in the
query is?

As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running
query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.

Yeah I think it would be nice to have.

I also think it would be better to create a dedicated thread
(specially looking at Pavel's comment below)

Thank you. I will.

In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't
it?

I am afraid of the significant performance impact of this feature.
In this case you have to copy all nested queries to the stat
collector process. Very common usage of PL is a glue of very fast
queries. Sure, it is used like glue for very slow queries too.
Just I thinking about two features:

OK, thanks for much advice and show alternative solutions.

1. extra interface for auto_explain, that allows you to get a stack
of statements assigned to some pid (probably these informations
should be stored inside shared memory and collected before any query
execution). Sometimes some slow function is slow due repeated
execution of relatively fast queries. In this case, the deeper
nested level is not too interesting. You need to see a stack of
calls and you are searching the first slow level in the stack.

Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.

2. can be nice to have a status column in pg_stat_activity, and
status GUC for sending a custom information from deep levels to the
user. Now, users use application_name, but some special variables
can be better for this purpose. This value of status can be
refreshed periodically and can substitute some tags. So developer
can set

BEGIN
-- before slow long query
SET status TO 'slow query calculation xxy %d';
...

It is a alternative to RAISE NOTICE, but with different format -
with format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times
of all levels from the call stack to get valuable information.

In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.

I worry which solution is best.

There is no best solution - @1 doesn't need manual work, but @1 is not too
useful when queries are similar (first n chars) and are long. In this case
custom messages are much more practical.

I don't think so we can implement only one design - in this case we can
support more tools with similar purpose but different behaviors in corner
cases.

Show quoted text

p.s. pg_stat_activity is maybe too wide table already, and probably
is not good to enhance this table too much

Thanks. I couldn't think from this point of view.

After I make some PoC patches, I will create a dedicated thread.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: Display individual query in pg_stat_activity

On 8/17/20 7:49 AM, Drouvot, Bertrand wrote:

Hi,

On 8/6/20 12:24 PM, Magnus Hagander wrote:

On Thu, Aug 6, 2020 at 12:17 PM Drouvot, Bertrand
<bdrouvot@amazon.com <mailto:bdrouvot@amazon.com>> wrote:

Hi,

On 7/27/20 4:57 PM, Dave Page wrote:

Hi

On Mon, Jul 27, 2020 at 3:40 PM Drouvot, Bertrand
<bdrouvot@amazon.com <mailto:bdrouvot@amazon.com>> wrote:

Hi hackers,

I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements
are currently displayed.

_Motivation:_

When multiple statements are displayed then we don’t know
which one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current
statement, or a value to indicate what the current statement
number in the query is?

Thanks for he feedback.

I like the idea of adding extra information without changing the
current behavior.

A value to indicate what the current statement number is, would
need parsing the query field by the user to get the individual
statement.

I think the separate field makes sense (though it come with an
extra memory price) as it will not change the existing behavior
and would just provide extra information (without any extra
parsing needed for the user).

Idle though without having considered it too much -- you might reduce
the memory overhead by just storing a start/end offset into the
combined query string instead of a copy of the query.

Good point, thanks for the feedback.

The new attached patch is making use of stmt_len and stmt_location
(instead of a copy of the query).

That way the cost would only be paid when doing the reading of
pg_stat_activity (by extracting the piece of the string), which I'd
argue is done orders of magnitude fewer times than the query changes
at least on busy systems.

The individual query extraction (making use of stmt_len and
stmt_location) has been moved to pg_stat_get_activity() in the new
attached patch (as opposed to pgstat_report_activity() in the previous
patch version).

Care would have to be taken for the case of the current executing
query actually being entirely past the end of the query string buffer
of course, but I don't think that's too hard to define a useful
behaviour for. (The user interface would stay the same, showing the
actual string and thus not requiring the user to do any parsing)

As a proposal the new attached patch does not display the individual
query if length + location is greater than
pgstat_track_activity_query_size (anyway it could not, as the query
field that might contain multiple statements is already <=
pgstat_track_activity_query_size in pg_stat_get_activity()).

Bertrand

Attaching a new version as the previous one was not passing the Patch
Tester anymore.

Bertrand

Show quoted text

--
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

Attachments:

v2-0003-pg_stat_activity_individual_query.patchtext/plain; charset=UTF-8; name=v2-0003-pg_stat_activity_individual_query.patch; x-mac-creator=0; x-mac-type=0Download+173-97
#15Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#14)
Re: Display individual query in pg_stat_activity

On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote:

Attaching a new version as the previous one was not passing the Patch Tester
anymore.

Ditto, the CF bot is complaining again. Could you send a rebase?
--
Michael

#16Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#15)
Re: Display individual query in pg_stat_activity

On 9/24/20 5:29 AM, Michael Paquier wrote:

On Thu, Sep 10, 2020 at 04:06:17PM +0200, Drouvot, Bertrand wrote:

Attaching a new version as the previous one was not passing the Patch Tester
anymore.

Ditto, the CF bot is complaining again. Could you send a rebase?

Thanks for letting me know.

Attached a new version.

Bertrand

Attachments:

v2-0004-pg_stat_activity_individual_query.patchtext/plain; charset=UTF-8; name=v2-0004-pg_stat_activity_individual_query.patch; x-mac-creator=0; x-mac-type=0Download+171-95
#17Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Bertrand Drouvot (#16)
Re: Display individual query in pg_stat_activity

Hi,

I noticed that this patch is failing on the cfbot.
For this, I changed the status to: 'Waiting on Author'

Cheers,
Georgios

The new status of this patch is: Waiting on Author

#18Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Georgios Kokolatos (#17)
Re: Display individual query in pg_stat_activity

This patch fails in the cfbot for quite some time now.
I shall close it as Return With Feedback and not move it to the next CF.
Please feel free to register an updated version afresh for the next CF.

Cheers,
//Georgios