timestamp for query in pg_stat_statements

Started by Zhihong Yualmost 4 years ago6 messages
#1Zhihong Yu
zyu@yugabyte.com

Hi,
Looking at pg_stat_statements, there doesn't seem to be timestamp column
for when the underlying query is performed.
Since the same query can be run multiple times, the absence of timestamp
column makes finding the most recent invocation of the query difficult.

Does it make sense to add such a column ?

Thanks

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Zhihong Yu (#1)
Re: timestamp for query in pg_stat_statements

On Sat, Mar 05, 2022 at 06:10:44PM -0800, Zhihong Yu wrote:

Looking at pg_stat_statements, there doesn't seem to be timestamp column
for when the underlying query is performed.
Since the same query can be run multiple times, the absence of timestamp
column makes finding the most recent invocation of the query difficult.

Does it make sense to add such a column ?

I don't think it would be that helpful. Why do you need to only know when the
last execution was, but no other details among every other cumulated counters?

You should consider using some other tools on top of pg_stat_statements (and
possibly other extensions) that performs snapshot regularly and can show you
all the details at the given frequency.

#3Zhihong Yu
zyu@yugabyte.com
In reply to: Julien Rouhaud (#2)
Re: timestamp for query in pg_stat_statements

On Sat, Mar 5, 2022 at 8:17 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sat, Mar 05, 2022 at 06:10:44PM -0800, Zhihong Yu wrote:

Looking at pg_stat_statements, there doesn't seem to be timestamp column
for when the underlying query is performed.
Since the same query can be run multiple times, the absence of timestamp
column makes finding the most recent invocation of the query difficult.

Does it make sense to add such a column ?

I don't think it would be that helpful. Why do you need to only know when
the
last execution was, but no other details among every other cumulated
counters?

You should consider using some other tools on top of pg_stat_statements
(and
possibly other extensions) that performs snapshot regularly and can show
you
all the details at the given frequency.

Hi,
The current design of pg_stat_statements doesn't have the concept of
observation.

By observation I mean scenarios where pg_stat_statements is read by people
doing performance tuning.

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with mean_time
of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Using other tools, a lot of the information in pg_stat_statements would be
duplicated to distinguish the counters recorded w.r.t. tuning operation.

I think pg_stat_statements can do better in this regard.

Cheers

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Zhihong Yu (#3)
Re: timestamp for query in pg_stat_statements

On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:

The current design of pg_stat_statements doesn't have the concept of
observation.

By observation I mean scenarios where pg_stat_statements is read by people
doing performance tuning.

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with mean_time
of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Well, if pg_stat_statements is read by people doing performance tuning
shouldn't they be able to distinguish which query text is the one they just
rewrote?

Using other tools, a lot of the information in pg_stat_statements would be
duplicated to distinguish the counters recorded w.r.t. tuning operation.

Yes, which is good. Your example was about rewriting a query, but what about
other possibilities like creating an index, changing hash_mem_multiplier...?
You won't get a new record and the mean_time will mostly be useless.

If you take regular snapshot, then you will be able to compute the mean_time
for each interval, and that will answer bot this scenario and the one in your
example (since the 2nd row won't exist in the earlier snapshots).

#5Zhihong Yu
zyu@yugabyte.com
In reply to: Julien Rouhaud (#4)
Re: timestamp for query in pg_stat_statements

On Sun, Mar 6, 2022 at 6:23 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:

The current design of pg_stat_statements doesn't have the concept of
observation.

By observation I mean scenarios where pg_stat_statements is read by

people

doing performance tuning.

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with

mean_time

of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Well, if pg_stat_statements is read by people doing performance tuning
shouldn't they be able to distinguish which query text is the one they just
rewrote?

Did I mention rewriting ?
As you said below, adding index is one way of tuning which doesn't involve
rewriting.

Please also note that the person tuning the query may be different from the
person writing the query.
So some information in pg_stat_statements (or related table) is needed to
disambiguate.

Show quoted text

Using other tools, a lot of the information in pg_stat_statements would

be

duplicated to distinguish the counters recorded w.r.t. tuning operation.

Yes, which is good. Your example was about rewriting a query, but what
about
other possibilities like creating an index, changing
hash_mem_multiplier...?
You won't get a new record and the mean_time will mostly be useless.

If you take regular snapshot, then you will be able to compute the
mean_time
for each interval, and that will answer bot this scenario and the one in
your
example (since the 2nd row won't exist in the earlier snapshots).

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Zhihong Yu (#5)
Re: timestamp for query in pg_stat_statements

On Sun, Mar 06, 2022 at 07:10:49PM -0800, Zhihong Yu wrote:

On Sun, Mar 6, 2022 at 6:23 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with

mean_time

of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Well, if pg_stat_statements is read by people doing performance tuning
shouldn't they be able to distinguish which query text is the one they just
rewrote?

Did I mention rewriting ?

How else would you end up with two entries in pg_stat_statements?

As you said below, adding index is one way of tuning which doesn't involve
rewriting.

Yes, and in that case you have a single row for that query, and mean_time is
useless. You need to compute it yourself using snapshots of
pg_stat_statements if you want to know how that query performed since the
optimization.

So some information in pg_stat_statements (or related table) is needed to
disambiguate.

In my opinion that's not pg_stat_statements' job. Like all other similar
infrastructure in postgres it only provides cumulated counters. You would
have exactly the same issue with e.g. pg_stat_user_indexes or pg_stat_bgwriter.