[PATCH] Tracking statements entry timestamp in pg_stat_statements

Started by Andrei Zubkovabout 5 years ago93 messageshackers
Jump to latest
#1Andrei Zubkov
zubkov@moonset.ru

This is a proposal for a new feature in pg_stat_statements extension.

As a statistical extension providing counters pg_stat_statements
extension is a target for various sampling solutions. All of them
interested in calculation of statement statistics increments between
two samples. But we face a problem here - observing one statement with
its statistics right now we can't be sure that statistics increment for
this statement is continuous from previous sample. This statement could
be deallocated after previous sample and come back soon. Also it could
happen that statement executions after that increased statistics to
above the values we observed in previous sample making it impossible to
detect deallocation on statement level.
My proposition here is to store statement entry timestamp. In this case
any sampling solution in case of returning statement will easily detect
it by changed timestamp value. And for every statement we will know
exact time interval for its statistics.

Attachments:

pg_stat_statements_ts_patch_2021_0322.patchtext/x-patch; charset=UTF-8; name=pg_stat_statements_ts_patch_2021_0322.patchDownload+123-3
#2Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Andrei Zubkov (#1)
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Dear Andrei,

I think the idea is good because the pg_stat_statements_info view cannot distinguish
whether the specific statement is deallocated or not.
But multiple calling of GetCurrentTimestamp() may cause some performance issues.
How about adding a configuration parameter for controlling this feature?
Or we don't not have to worry about that?

+		if (api_version >= PGSS_V1_9)
+		{
+			values[i++] = TimestampTzGetDatum(first_seen);
+		}

I think {} is not needed here.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#3Andrei Zubkov
zubkov@moonset.ru
In reply to: Hayato Kuroda (Fujitsu) (#2)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi Kuroda,

Thank you for your attention!

On Tue, 2021-03-23 at 02:13 +0000, kuroda.hayato@fujitsu.com wrote:

But multiple calling of GetCurrentTimestamp() may cause some
performance issues.
How about adding a configuration parameter for controlling this
feature?
Or we don't not have to worry about that?

Certaily I was thinking about this. And I've taken an advice of Teodor
Sigaev - a much more expirienced developer than me. It seems that
GetCurrentTimestamp() is fast enough for our purpose and we won't call
it too often - only on new statement entry allocation.

By the way right now in my workload tracing tool pg_profile I have to
reset pg_stat_statements on every sample (wich is about 30-60 minutes)
to make sure that all workload between samples is captured. This causes
much more overhead. Introduced first_seen column can eliminate the need
of resets.

However, there is another way - we can store the curent value
of pg_stat_statements_info.dealloc field when allocating a new
statement entry instead of timstamping it. Probably, it would be little
faster, but timestamp seems much more valuable here.

+		if (api_version >= PGSS_V1_9)
+		{
+			values[i++] = TimestampTzGetDatum(first_seen);
+		}

I think {} is not needed here.

Of course, thank you!
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#3)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

On Tue, Mar 23, 2021 at 09:50:16AM +0300, Andrei Zubkov wrote:

By the way right now in my workload tracing tool pg_profile I have to
reset pg_stat_statements on every sample (wich is about 30-60 minutes)
to make sure that all workload between samples is captured. This causes
much more overhead. Introduced first_seen column can eliminate the need
of resets.

Note that you could also detect entries for which some counters decreased (e.g.
the execution count), and in that case only use the current values. It should
give the exact same results as what you will get with the first_seen column,
except of course if some entry is almost never used and is suddenly used a lot
after an explicit reset or an eviction and only until you perform your
snapshot. I'm not sure that it's a very likely scenario though.

FTR that's how powa currently deals with reset/eviction.

#5Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#4)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi Julien,

On Tue, 2021-03-23 at 15:03 +0800, Julien Rouhaud wrote:

Note that you could also detect entries for which some counters
decreased (e.g.
the execution count), and in that case only use the current values.

Yes, but checking condition for several counters seems complicated
compared to check only one field.

It should
give the exact same results as what you will get with the first_seen
column,
except of course if some entry is almost never used and is suddenly
used a lot
after an explicit reset or an eviction and only until you perform
your
snapshot. I'm not sure that it's a very likely scenario though.

But it is possible, and we are guessing here. Storing a timestamp does
not seems too expensive to me, but it totally eliminates guessing, and
provides a clear view about the time interval we watching for this
specific statement.

FTR that's how powa currently deals with reset/eviction.

PoWA sampling is much more frequent than pg_profile. For PoWA it is, of
cource, very unlikely scenario, but still possible.

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#6Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Andrei Zubkov (#3)
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Dear Andrei,

Certaily I was thinking about this. And I've taken an advice of Teodor
Sigaev - a much more expirienced developer than me. It seems that
GetCurrentTimestamp() is fast enough for our purpose and we won't call
it too often - only on new statement entry allocation.

OK.

However, there is another way - we can store the curent value
of pg_stat_statements_info.dealloc field when allocating a new
statement entry instead of timstamping it. Probably, it would be little
faster, but timestamp seems much more valuable here.

I don't like the idea because such a column has no meaning for the specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#7Andrei Zubkov
zubkov@moonset.ru
In reply to: Hayato Kuroda (Fujitsu) (#6)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Dear Kuroda,

I don't like the idea because such a column has no meaning for the
specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.

I agree. New version attached.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+121-4
#8Seino Yuki
seinoyu@oss.nttdata.com
In reply to: Andrei Zubkov (#7)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-03-23 23:08 に Andrei Zubkov さんは書きました:

Dear Kuroda,

I don't like the idea because such a column has no meaning for the
specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.

I agree. New version attached.

Thanks for posting the patch.
I agree with this content.

Is it necessary to update the version of pg_stat_statements now that the
release is targeted for PG15?
We take into account the risk that users will misunderstand.

Regards,

--
Yuki Seino
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#9Andrei Zubkov
zubkov@moonset.ru
In reply to: Seino Yuki (#8)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

On Wed, 2021-04-07 at 17:26 +0900, Seino Yuki wrote:

Is it necessary to update the version of pg_stat_statements now that
the
release is targeted for PG15?

I think, yes, version of pg_stat_statements is need to be updated. Is
it will be 1.10 in PG15?

Regards,
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#10Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Andrei Zubkov (#9)
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Dear Andrei,

I think, yes, version of pg_stat_statements is need to be updated. Is
it will be 1.10 in PG15?

I think you are right.
According to [1]/messages/by-id/20201202040516.GA43757@nol we can bump up the version per one PG major version,
and any features are not committed yet for 15.

[1]: /messages/by-id/20201202040516.GA43757@nol

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#11Andrei Zubkov
zubkov@moonset.ru
In reply to: Hayato Kuroda (Fujitsu) (#10)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi, Kuroda!

I've intended to change the pg_stat_statements version with rebasing
this patch to the current master branch state. Now this is commit
07e5e66.

But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.

Am I mistaken somewhere? Maybe you know why this is happening?

Thank you!
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Show quoted text

On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:

Dear Andrei,

I think, yes, version of pg_stat_statements is need to be updated.
Is
it will be 1.10 in PG15?

I think you are right.
According to [1] we can bump up the version per one PG major version,
and any features are not committed yet for 15.

[1]: /messages/by-id/20201202040516.GA43757@nol

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#11)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Le mer. 14 avr. 2021 à 17:22, Andrei Zubkov <zubkov@moonset.ru> a écrit :

But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.

Am I mistaken somewhere? Maybe you know why this is happening?

did you enable compute_query_id new parameter?

Show quoted text
#13Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#12)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

On Wed, 2021-04-14 at 17:32 +0800, Julien Rouhaud wrote:

did you enable compute_query_id new parameter? 

Hi, Julien!
Thank you very much! I've missed it.

Show quoted text
#14Andrei Zubkov
zubkov@moonset.ru
In reply to: Hayato Kuroda (Fujitsu) (#10)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hello, Kuroda!

On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:

I think you are right.
According to [1] we can bump up the version per one PG major version,
and any features are not committed yet for 15.

[1]: /messages/by-id/20201202040516.GA43757@nol

Version 2 of patch attached. 
pg_stat_statements version is now 1.10 and patch is based on 0f61727.

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v2-0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=v2-0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+137-6
#15Chengxi Sun
sunchengxi@highgo.com
In reply to: Andrei Zubkov (#14)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi, Andrei

I tested your patch, and it works well. I also prefer timestamp inseatead of dealloc num.
I think it can provide more useful details about query statements.

Regards,
Martin Sun

#16Andrei Zubkov
zubkov@moonset.ru
In reply to: Chengxi Sun (#15)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi, Martin

On Mon, 2021-04-19 at 11:39 +0000, Chengxi Sun wrote:

I tested your patch, and it works well. I also prefer timestamp
inseatead of dealloc num.
I think it can provide more useful details about query statements.

Thank you for your review.
Certainly, timestamp is valuable here. Deallocation number is only a
workaround in unlikely case when timestamping will cost a much. It
seems, that it can happen only when significant amount of statements
causes a new entry in pg_stat_statements hashtable. However, in such
case using of pg_stat_statements extension might be qute difficult.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#17Anton A. Melnikov
aamelnikov@inbox.ru
In reply to: Andrei Zubkov (#16)
Re[2]: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi, Andrey!
 
I’ve tried to apply your patch v2-0001 on current master, but i failed.
There were git apply errors at:
pg_stat_statements.out:941
pg_stat_statements.sql:385
 
Best Regards,

Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

#18Andrei Zubkov
zubkov@moonset.ru
In reply to: Anton A. Melnikov (#17)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

Hi, Anton!

I've corrected the patch and attached a new version.

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Show quoted text

On Wed, 2021-10-06 at 18:13 +0300, Мельников Антон Андреевич wrote:

Hi, Andrey!
 
I’ve tried to apply your patch v2-0001 on current master, but i
failed.
There were git apply errors at:
pg_stat_statements.out:941
pg_stat_statements.sql:385
 
Best Regards,

Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Attachments:

v3-0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=v3-0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+137-6
#19Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#18)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

There is an issue with this patch. It's main purpose is the ability to
calculate values of pg_stat_statements view for a time period between
two samples without resetting pg_stat_statements being absolutely sure
that the statement was not evicted.
Such approach solves the problem for metrics with except of min and max
time values. It seems that partial reset is needed here resetting
min/max values during a sample. But overall min/max values will be lost
in this case. Does addition of resettable min/max metrics to the
pg_stat_statemets view seems reasonable here?

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#20Anton A. Melnikov
aamelnikov@inbox.ru
In reply to: Andrei Zubkov (#19)
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

On 07.10.2021 15:31, Andrei Zubkov wrote:

There is an issue with this patch. It's main purpose is the ability to
calculate values of pg_stat_statements view
  [...]
Does addition of resettable min/max metrics to the
pg_stat_statemets view seems reasonable here?

Hello, Andrey!

I think it depends on what the slow top level sampler wants.
Let define the current values in pg_stat_statements for some query as gmin and gmax.
It seems to be a two main variants:
1) If top level sampler wants to know for some query the min and max values for
the entire watch time, then existing gmin and gmax in pg_stat_statements are sufficient.
The top level sampler can clarify top_min and top_max at every slow sample as follows:
top_max = gmax > top_max ? gmax : top_max;
top_min = gmin < top_min ? gmin : top_min;
This should work regardless of whether there was a reset between samples or not.
2) The second case happens when the top level sampler wants to know the min and max
values for sampling period.
In that case i think one shouldn't not use gmin and gmax and especially reset
them asynchronously from outside because its lifetime and sampling period are
independent values and moreover someone else might need gmin and gmax as is.
So i suppose that additional vars loc_min and loc_max is a right way to do it.
If that, at every top sample one need to replace not clarify
the new top values as follows:
top_max = loc_max; loc_max = 0;
top_min = loc_min; loc_min = INT_MAX;
And one more thing, if there was a reset of stats between two samples,
then i think it is the best to ignore the new values,
since they were obtained for an incomplete period.
This patch, thanks to the saved time stamp, makes possible
to determine the presence of reset between samples and
there should not be a problem to realize such algorithm.

The patch is now applied normally, all my tests were successful.
The only thing I could suggest to your notice
is a small cosmetic edit to replace
the numeric value in #define on line 1429 of pg_stat_statements.c
by one of the constants defined above.

Best regards,
Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#21Andrei Zubkov
zubkov@moonset.ru
In reply to: Anton A. Melnikov (#20)
#22Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#21)
#23Anton A. Melnikov
aamelnikov@inbox.ru
In reply to: Andrei Zubkov (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Andrei Zubkov (#21)
#25Andrei Zubkov
zubkov@moonset.ru
In reply to: Andres Freund (#24)
#26Anton A. Melnikov
aamelnikov@inbox.ru
In reply to: Andrei Zubkov (#25)
#27Julien Rouhaud
rjuju123@gmail.com
In reply to: Anton A. Melnikov (#26)
#28Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#27)
#29Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#28)
#30Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#29)
#31Anton A. Melnikov
aamelnikov@inbox.ru
In reply to: Andrei Zubkov (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Anton A. Melnikov (#31)
#33Andrei Zubkov
zubkov@moonset.ru
In reply to: Bruce Momjian (#32)
#34Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#33)
#35Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Andrei Zubkov (#35)
#37Julien Rouhaud
rjuju123@gmail.com
In reply to: Bruce Momjian (#36)
#38Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#37)
#39Andres Freund
andres@anarazel.de
In reply to: Andrei Zubkov (#38)
#40Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#35)
#41Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#38)
#42Julien Rouhaud
rjuju123@gmail.com
In reply to: Andres Freund (#39)
#43Andrei Zubkov
zubkov@moonset.ru
In reply to: Andres Freund (#39)
#44Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#43)
#45Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#44)
#46Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#45)
#47Bruce Momjian
bruce@momjian.us
In reply to: Andrei Zubkov (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#47)
#49Andrei Zubkov
zubkov@moonset.ru
In reply to: Bruce Momjian (#47)
#50Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#49)
#51Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#50)
#52Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#50)
#53Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#52)
#54Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#53)
#55Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#54)
#56Andrei Zubkov
zubkov@moonset.ru
In reply to: Julien Rouhaud (#55)
#57Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrei Zubkov (#56)
#58Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#56)
#59Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrei Zubkov (#58)
#60Andrei Zubkov
zubkov@moonset.ru
In reply to: Tomas Vondra (#59)
#61Andrei Zubkov
zubkov@moonset.ru
In reply to: Tomas Vondra (#59)
#62Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#61)
#63Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#62)
#64Gregory Stark (as CFM)
stark.cfm@gmail.com
In reply to: Andrei Zubkov (#63)
#65Andrei Zubkov
zubkov@moonset.ru
In reply to: Gregory Stark (as CFM) (#64)
#66Gregory Stark (as CFM)
stark.cfm@gmail.com
In reply to: Andrei Zubkov (#65)
#67Andrei Zubkov
zubkov@moonset.ru
In reply to: Gregory Stark (as CFM) (#66)
#68Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#67)
#69Michael Paquier
michael@paquier.xyz
In reply to: Andrei Zubkov (#68)
#70Andrei Zubkov
zubkov@moonset.ru
In reply to: Michael Paquier (#69)
#71Andrei Zubkov
zubkov@moonset.ru
In reply to: Michael Paquier (#69)
In reply to: Andrei Zubkov (#71)
#73Andrei Zubkov
zubkov@moonset.ru
In reply to: Sergei Kornilov (#72)
#74Daniel Gustafsson
daniel@yesql.se
In reply to: Andrei Zubkov (#73)
#75Andrei Zubkov
zubkov@moonset.ru
In reply to: Daniel Gustafsson (#74)
#76Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#75)
#77Peter Eisentraut
peter_e@gmx.net
In reply to: Andrei Zubkov (#76)
#78Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#77)
#79Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#78)
#80Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Zubkov (#75)
#81Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Lepikhov (#80)
#82Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Andrei Zubkov (#75)
#83Andrei Zubkov
zubkov@moonset.ru
In reply to: Alena Rybakina (#82)
#84Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Zubkov (#81)
#85Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Lepikhov (#84)
#86Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Andrei Zubkov (#83)
#87Andrei Zubkov
zubkov@moonset.ru
In reply to: Alena Rybakina (#86)
#88Andrei Zubkov
zubkov@moonset.ru
In reply to: Andrei Zubkov (#87)
#89Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Zubkov (#88)
#90Julien Rouhaud
rjuju123@gmail.com
In reply to: Alexander Korotkov (#89)
#91Andrei Zubkov
zubkov@moonset.ru
In reply to: Alexander Korotkov (#89)
#92Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Zubkov (#91)
#93Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Korotkov (#92)