[PATCH] Tracking statements entry timestamp in pg_stat_statements
This is a proposal for a new feature in pg_stat_statements extension.
As a statistical extension providing counters pg_stat_statements
extension is a target for various sampling solutions. All of them
interested in calculation of statement statistics increments between
two samples. But we face a problem here - observing one statement with
its statistics right now we can't be sure that statistics increment for
this statement is continuous from previous sample. This statement could
be deallocated after previous sample and come back soon. Also it could
happen that statement executions after that increased statistics to
above the values we observed in previous sample making it impossible to
detect deallocation on statement level.
My proposition here is to store statement entry timestamp. In this case
any sampling solution in case of returning statement will easily detect
it by changed timestamp value. And for every statement we will know
exact time interval for its statistics.
Attachments:
pg_stat_statements_ts_patch_2021_0322.patchtext/x-patch; charset=UTF-8; name=pg_stat_statements_ts_patch_2021_0322.patchDownload+123-3
Dear Andrei,
I think the idea is good because the pg_stat_statements_info view cannot distinguish
whether the specific statement is deallocated or not.
But multiple calling of GetCurrentTimestamp() may cause some performance issues.
How about adding a configuration parameter for controlling this feature?
Or we don't not have to worry about that?
+ if (api_version >= PGSS_V1_9) + { + values[i++] = TimestampTzGetDatum(first_seen); + }
I think {} is not needed here.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Hi Kuroda,
Thank you for your attention!
On Tue, 2021-03-23 at 02:13 +0000, kuroda.hayato@fujitsu.com wrote:
But multiple calling of GetCurrentTimestamp() may cause some
performance issues.
How about adding a configuration parameter for controlling this
feature?
Or we don't not have to worry about that?
Certaily I was thinking about this. And I've taken an advice of Teodor
Sigaev - a much more expirienced developer than me. It seems that
GetCurrentTimestamp() is fast enough for our purpose and we won't call
it too often - only on new statement entry allocation.
By the way right now in my workload tracing tool pg_profile I have to
reset pg_stat_statements on every sample (wich is about 30-60 minutes)
to make sure that all workload between samples is captured. This causes
much more overhead. Introduced first_seen column can eliminate the need
of resets.
However, there is another way - we can store the curent value
of pg_stat_statements_info.dealloc field when allocating a new
statement entry instead of timstamping it. Probably, it would be little
faster, but timestamp seems much more valuable here.
+ if (api_version >= PGSS_V1_9) + { + values[i++] = TimestampTzGetDatum(first_seen); + }I think {} is not needed here.
Of course, thank you!
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Mar 23, 2021 at 09:50:16AM +0300, Andrei Zubkov wrote:
By the way right now in my workload tracing tool pg_profile I have to
reset pg_stat_statements on every sample (wich is about 30-60 minutes)
to make sure that all workload between samples is captured. This causes
much more overhead. Introduced first_seen column can eliminate the need
of resets.
Note that you could also detect entries for which some counters decreased (e.g.
the execution count), and in that case only use the current values. It should
give the exact same results as what you will get with the first_seen column,
except of course if some entry is almost never used and is suddenly used a lot
after an explicit reset or an eviction and only until you perform your
snapshot. I'm not sure that it's a very likely scenario though.
FTR that's how powa currently deals with reset/eviction.
Hi Julien,
On Tue, 2021-03-23 at 15:03 +0800, Julien Rouhaud wrote:
Note that you could also detect entries for which some counters
decreased (e.g.
the execution count), and in that case only use the current values.
Yes, but checking condition for several counters seems complicated
compared to check only one field.
It should
give the exact same results as what you will get with the first_seen
column,
except of course if some entry is almost never used and is suddenly
used a lot
after an explicit reset or an eviction and only until you perform
your
snapshot. I'm not sure that it's a very likely scenario though.
But it is possible, and we are guessing here. Storing a timestamp does
not seems too expensive to me, but it totally eliminates guessing, and
provides a clear view about the time interval we watching for this
specific statement.
FTR that's how powa currently deals with reset/eviction.
PoWA sampling is much more frequent than pg_profile. For PoWA it is, of
cource, very unlikely scenario, but still possible.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Dear Andrei,
Certaily I was thinking about this. And I've taken an advice of Teodor
Sigaev - a much more expirienced developer than me. It seems that
GetCurrentTimestamp() is fast enough for our purpose and we won't call
it too often - only on new statement entry allocation.
OK.
However, there is another way - we can store the curent value
of pg_stat_statements_info.dealloc field when allocating a new
statement entry instead of timstamping it. Probably, it would be little
faster, but timestamp seems much more valuable here.
I don't like the idea because such a column has no meaning for the specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Kuroda,
I don't like the idea because such a column has no meaning for the
specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.
I agree. New version attached.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+121-4
2021-03-23 23:08 に Andrei Zubkov さんは書きました:
Dear Kuroda,
I don't like the idea because such a column has no meaning for the
specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.I agree. New version attached.
Thanks for posting the patch.
I agree with this content.
Is it necessary to update the version of pg_stat_statements now that the
release is targeted for PG15?
We take into account the risk that users will misunderstand.
Regards,
--
Yuki Seino
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, 2021-04-07 at 17:26 +0900, Seino Yuki wrote:
Is it necessary to update the version of pg_stat_statements now that
the
release is targeted for PG15?
I think, yes, version of pg_stat_statements is need to be updated. Is
it will be 1.10 in PG15?
Regards,
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Dear Andrei,
I think, yes, version of pg_stat_statements is need to be updated. Is
it will be 1.10 in PG15?
I think you are right.
According to [1]/messages/by-id/20201202040516.GA43757@nol we can bump up the version per one PG major version,
and any features are not committed yet for 15.
[1]: /messages/by-id/20201202040516.GA43757@nol
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Hi, Kuroda!
I've intended to change the pg_stat_statements version with rebasing
this patch to the current master branch state. Now this is commit
07e5e66.
But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.
Am I mistaken somewhere? Maybe you know why this is happening?
Thank you!
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Show quoted text
On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:
Dear Andrei,
I think, yes, version of pg_stat_statements is need to be updated.
Is
it will be 1.10 in PG15?I think you are right.
According to [1] we can bump up the version per one PG major version,
and any features are not committed yet for 15.[1]: /messages/by-id/20201202040516.GA43757@nol
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Le mer. 14 avr. 2021 à 17:22, Andrei Zubkov <zubkov@moonset.ru> a écrit :
But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.Am I mistaken somewhere? Maybe you know why this is happening?
did you enable compute_query_id new parameter?
Show quoted text
On Wed, 2021-04-14 at 17:32 +0800, Julien Rouhaud wrote:
did you enable compute_query_id new parameter?
Hi, Julien!
Thank you very much! I've missed it.
Show quoted text
Hello, Kuroda!
On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:
I think you are right.
According to [1] we can bump up the version per one PG major version,
and any features are not committed yet for 15.
Version 2 of patch attached.
pg_stat_statements version is now 1.10 and patch is based on 0f61727.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v2-0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=v2-0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+137-6
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi, Andrei
I tested your patch, and it works well. I also prefer timestamp inseatead of dealloc num.
I think it can provide more useful details about query statements.
Regards,
Martin Sun
Hi, Martin
On Mon, 2021-04-19 at 11:39 +0000, Chengxi Sun wrote:
I tested your patch, and it works well. I also prefer timestamp
inseatead of dealloc num.
I think it can provide more useful details about query statements.
Thank you for your review.
Certainly, timestamp is valuable here. Deallocation number is only a
workaround in unlikely case when timestamping will cost a much. It
seems, that it can happen only when significant amount of statements
causes a new entry in pg_stat_statements hashtable. However, in such
case using of pg_stat_statements extension might be qute difficult.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi, Andrey!
I’ve tried to apply your patch v2-0001 on current master, but i failed.
There were git apply errors at:
pg_stat_statements.out:941
pg_stat_statements.sql:385
Best Regards,
Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi, Anton!
I've corrected the patch and attached a new version.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Show quoted text
On Wed, 2021-10-06 at 18:13 +0300, Мельников Антон Андреевич wrote:
Hi, Andrey!
I’ve tried to apply your patch v2-0001 on current master, but i
failed.
There were git apply errors at:
pg_stat_statements.out:941
pg_stat_statements.sql:385
Best Regards,Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v3-0001-pg_stat_statements-Track-statement-entry-timestamp.patchtext/x-patch; charset=UTF-8; name=v3-0001-pg_stat_statements-Track-statement-entry-timestamp.patchDownload+137-6
There is an issue with this patch. It's main purpose is the ability to
calculate values of pg_stat_statements view for a time period between
two samples without resetting pg_stat_statements being absolutely sure
that the statement was not evicted.
Such approach solves the problem for metrics with except of min and max
time values. It seems that partial reset is needed here resetting
min/max values during a sample. But overall min/max values will be lost
in this case. Does addition of resettable min/max metrics to the
pg_stat_statemets view seems reasonable here?
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07.10.2021 15:31, Andrei Zubkov wrote:
There is an issue with this patch. It's main purpose is the ability to
calculate values of pg_stat_statements view
[...]
Does addition of resettable min/max metrics to the
pg_stat_statemets view seems reasonable here?
Hello, Andrey!
I think it depends on what the slow top level sampler wants.
Let define the current values in pg_stat_statements for some query as gmin and gmax.
It seems to be a two main variants:
1) If top level sampler wants to know for some query the min and max values for
the entire watch time, then existing gmin and gmax in pg_stat_statements are sufficient.
The top level sampler can clarify top_min and top_max at every slow sample as follows:
top_max = gmax > top_max ? gmax : top_max;
top_min = gmin < top_min ? gmin : top_min;
This should work regardless of whether there was a reset between samples or not.
2) The second case happens when the top level sampler wants to know the min and max
values for sampling period.
In that case i think one shouldn't not use gmin and gmax and especially reset
them asynchronously from outside because its lifetime and sampling period are
independent values and moreover someone else might need gmin and gmax as is.
So i suppose that additional vars loc_min and loc_max is a right way to do it.
If that, at every top sample one need to replace not clarify
the new top values as follows:
top_max = loc_max; loc_max = 0;
top_min = loc_min; loc_min = INT_MAX;
And one more thing, if there was a reset of stats between two samples,
then i think it is the best to ignore the new values,
since they were obtained for an incomplete period.
This patch, thanks to the saved time stamp, makes possible
to determine the presence of reset between samples and
there should not be a problem to realize such algorithm.
The patch is now applied normally, all my tests were successful.
The only thing I could suggest to your notice
is a small cosmetic edit to replace
the numeric value in #define on line 1429 of pg_stat_statements.c
by one of the constants defined above.
Best regards,
Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company