idea: log_statement_sample_rate - bottom limit for sampling
Hi
I like the idea of sampling slow statements via log_statement_sample_rate.
But I miss some parameter that can ensure so every query executed over this
limit is logged.
Can we introduce new option
log_statement_sampling_limit
The query with execution time over this limit is logged every time.
What do you think about this?
Regards
Pavel
Le 06/06/2019 à 10:38, Pavel Stehule a écrit :
Hi
I like the idea of sampling slow statements via
log_statement_sample_rate. But I miss some parameter that can ensure
so every query executed over this limit is logged.Can we introduce new option
log_statement_sampling_limit
The query with execution time over this limit is logged every time.
What do you think about this?
Regards
Pavel
+1, log_min_duration_statement is modulated by log_statement_sample_rate
that mean that there is no more way to log all statements over a certain
duration limit. log_statement_sampling_limit might probably always be
upper than log_min_duration_statement.
--
Gilles Darold
http://www.darold.net/
Hi
čt 6. 6. 2019 v 10:48 odesílatel Gilles Darold <gilles@darold.net> napsal:
Le 06/06/2019 à 10:38, Pavel Stehule a écrit :
Hi
I like the idea of sampling slow statements via
log_statement_sample_rate. But I miss some parameter that can ensure
so every query executed over this limit is logged.Can we introduce new option
log_statement_sampling_limit
The query with execution time over this limit is logged every time.
What do you think about this?
Regards
Pavel
+1, log_min_duration_statement is modulated by log_statement_sample_rate
that mean that there is no more way to log all statements over a certain
duration limit. log_statement_sampling_limit might probably always be
upper than log_min_duration_statement.
Here is a patch
Regards
Pavel
Show quoted text
--
Gilles Darold
http://www.darold.net/
Attachments:
log_statement_sample_limit.patchtext/x-patch; charset=US-ASCII; name=log_statement_sample_limit.patchDownload+44-2
Hi
po 17. 6. 2019 v 22:40 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
Hi
čt 6. 6. 2019 v 10:48 odesílatel Gilles Darold <gilles@darold.net> napsal:
Le 06/06/2019 à 10:38, Pavel Stehule a écrit :
Hi
I like the idea of sampling slow statements via
log_statement_sample_rate. But I miss some parameter that can ensure
so every query executed over this limit is logged.Can we introduce new option
log_statement_sampling_limit
The query with execution time over this limit is logged every time.
What do you think about this?
Regards
Pavel
+1, log_min_duration_statement is modulated by log_statement_sample_rate
that mean that there is no more way to log all statements over a certain
duration limit. log_statement_sampling_limit might probably always be
upper than log_min_duration_statement.Here is a patch
I did error in logic - fixed
Show quoted text
Regards
Pavel
--
Gilles Darold
http://www.darold.net/
Attachments:
log_statement_sample_limit-2.patchtext/x-patch; charset=US-ASCII; name=log_statement_sample_limit-2.patchDownload+43-2
Hi,
I tried the patch, here my comment:
gettext_noop("Zero effective disables sampling. "
"-1 use sampling every time (without limit)."),
I do not agree with the zero case. In fact, sampling is disabled as soon as
setting is less than log_min_duration_statements. Furthermore, I think we should
provide a more straightforward description for users.
I changed few comments and documentation:
* As we added much more logic in this function with statement and transaction
sampling. And now with statement_sample_rate, it is not easy to understand the
logic on first look. I reword comment in check_log_duration, I hope it is more
straightforward.
* I am not sure if "every_time" is a good naming for the variable. In fact, if
duration exceeds limit we disable sampling. Maybe sampling_disabled is more clear?
* I propose to add some words in log_min_duration_statement and
log_statement_sample_rate documentation.
* Rephrased log_statement_sample_limit documentation, I hope it help
understanding.
Patch attached.
Regards,
--
Adrien
Attachments:
log_statement_sample_limit-3.patchtext/x-patch; name=log_statement_sample_limit-3.patchDownload+58-6
út 18. 6. 2019 v 14:03 odesílatel Adrien Nayrat <adrien.nayrat@anayrat.info>
napsal:
Hi,
I tried the patch, here my comment:
gettext_noop("Zero effective disables sampling. "
"-1 use sampling every time (without limit)."),I do not agree with the zero case. In fact, sampling is disabled as soon as
setting is less than log_min_duration_statements. Furthermore, I think we
should
provide a more straightforward description for users.
You have true, but I have not a idea,how to describe it in one line. In
this case the zero is corner case, and sampling is disabled without
dependency on log_min_duration_statement.
I think so this design has only few useful values and ranges
a) higher than log_min_duration_statement .. sampling is active with limit
b) 0 .. for this case - other way how to effective disable sampling - no
dependency on other
c) -1 or negative value - sampling is allowed every time.
Sure, there is range (0..log_min_duration_statement), but for this range
this value has not sense. I think so this case cannot be mentioned in short
description. But it should be mentioned in documentation.
I changed few comments and documentation:
* As we added much more logic in this function with statement and
transaction
sampling. And now with statement_sample_rate, it is not easy to understand
the
logic on first look. I reword comment in check_log_duration, I hope it is
more
straightforward.* I am not sure if "every_time" is a good naming for the variable. In
fact, if
duration exceeds limit we disable sampling. Maybe sampling_disabled is
more clear?
For me important is following line
(exceeded && (in_sample || every_time))
I think so "every_time" or "always" or "every" is in this context more
illustrative than "sampling_disabled". But my opinion is not strong in this
case, and I have not a problem accept common opinion.
Show quoted text
* I propose to add some words in log_min_duration_statement and
log_statement_sample_rate documentation.* Rephrased log_statement_sample_limit documentation, I hope it help
understanding.Patch attached.
Regards,
--
Adrien
On 6/18/19 8:29 PM, Pavel Stehule wrote:
út 18. 6. 2019 v 14:03 odesílatel Adrien Nayrat <adrien.nayrat@anayrat.info
<mailto:adrien.nayrat@anayrat.info>> napsal:Hi,
I tried the patch, here my comment:
gettext_noop("Zero effective disables sampling. "
"-1 use sampling every time (without limit)."),I do not agree with the zero case. In fact, sampling is disabled as soon as
setting is less than log_min_duration_statements. Furthermore, I think we should
provide a more straightforward description for users.You have true, but I have not a idea,how to describe it in one line. In this
case the zero is corner case, and sampling is disabled without dependency on
log_min_duration_statement.
I think so this design has only few useful values and rangesa) higher than log_min_duration_statement .. sampling is active with limit
b) 0 .. for this case - other way how to effective disable sampling - no
dependency on other
c) -1 or negative value - sampling is allowed every time.Sure, there is range (0..log_min_duration_statement), but for this range this
value has not sense. I think so this case cannot be mentioned in short
description. But it should be mentioned in documentation.
Yes, it took me a while to understand :) I am ok to keep simple in GUC
description and give more information in documentation.
I changed few comments and documentation:
* As we added much more logic in this function with statement and transaction
sampling. And now with statement_sample_rate, it is not easy to understand the
logic on first look. I reword comment in check_log_duration, I hope it is more
straightforward.* I am not sure if "every_time" is a good naming for the variable. In fact, if
duration exceeds limit we disable sampling. Maybe sampling_disabled is more
clear?For me important is following line
(exceeded && (in_sample || every_time))
I think so "every_time" or "always" or "every" is in this context more
illustrative than "sampling_disabled". But my opinion is not strong in this
case, and I have not a problem accept common opinion.
Oh, yes, that's correct. I do not have a strong opinion too. Maybe someone else
will have better idea.
--
Adrien
st 19. 6. 2019 v 10:49 odesílatel Adrien Nayrat <adrien.nayrat@anayrat.info>
napsal:
On 6/18/19 8:29 PM, Pavel Stehule wrote:
út 18. 6. 2019 v 14:03 odesílatel Adrien Nayrat <
adrien.nayrat@anayrat.info
<mailto:adrien.nayrat@anayrat.info>> napsal:
Hi,
I tried the patch, here my comment:
gettext_noop("Zero effective disables sampling. "
"-1 use sampling every time (withoutlimit)."),
I do not agree with the zero case. In fact, sampling is disabled as
soon as
setting is less than log_min_duration_statements. Furthermore, I
think we should
provide a more straightforward description for users.
You have true, but I have not a idea,how to describe it in one line. In
this
case the zero is corner case, and sampling is disabled without
dependency on
log_min_duration_statement.
I think so this design has only few useful values and ranges
a) higher than log_min_duration_statement .. sampling is active with
limit
b) 0 .. for this case - other way how to effective disable sampling - no
dependency on other
c) -1 or negative value - sampling is allowed every time.Sure, there is range (0..log_min_duration_statement), but for this range
this
value has not sense. I think so this case cannot be mentioned in short
description. But it should be mentioned in documentation.Yes, it took me a while to understand :) I am ok to keep simple in GUC
description and give more information in documentation.
Maybe some like. "The zero block sampling. Negative value forces sampling
without limit"
I changed few comments and documentation:
* As we added much more logic in this function with statement and
transaction
sampling. And now with statement_sample_rate, it is not easy to
understand the
logic on first look. I reword comment in check_log_duration, I hope
it is more
straightforward.
* I am not sure if "every_time" is a good naming for the variable.
In fact, if
duration exceeds limit we disable sampling. Maybe sampling_disabled
is more
clear?
For me important is following line
(exceeded && (in_sample || every_time))
I think so "every_time" or "always" or "every" is in this context more
illustrative than "sampling_disabled". But my opinion is not strong inthis
case, and I have not a problem accept common opinion.
Oh, yes, that's correct. I do not have a strong opinion too. Maybe someone
else
will have better idea.
the naming in this case is not hard issue, and comitter can decide.
Regards
Pavel
Show quoted text
--
Adrien
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed
I test the latest patch attached to this thread (log_statement_sample_limit-3.patch). Everything looks good to me.
The new status of this patch is: Ready for Committer
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed
Sorry, I forgot to tick "passed" boxes.
Hi
pá 12. 7. 2019 v 13:07 odesílatel Adrien Nayrat <adrien.nayrat@gmail.com>
napsal:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passedSorry, I forgot to tick "passed" boxes.
Thank you
Pavel
Hi,
I've started reviewing this patch, thinking that maybe I could get it
committed as it's marked as RFC. In general I agree with having this
fuature, but I think we need to rethink the GUC because the current
approach is just confusing.
The way the current patch works is that we have three GUCs:
log_min_duration_statement
log_statement_sample_limit
log_statement_sample_rate
and it essentially works like this:
- If the duration exceeds log_min_duration_statement, we start sampling
the commands with log_statement_sample rate.
- If the duration exceeds log_statement_sample_limit, we just log the
command every time (i.e. we disable sampling, using sample rate 1.0).
IMO that's bound to be confusing for users, because one threshold
behaves as minimum while the other behaves as maximum.
What I think we should do instead is to use two minimum thresholds.
1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate
2) log_min_duration_statement - logs all commands exceeding this
I think this is going to be much easier for users to understand.
The one difference between those approaches is in how they work with
existing current settings. That is, let's say you have
log_min_duration_statement = 1000
log_statement_sample_rate = 0.01
then no queries below 1000ms will be logged, and 1% of longer queries
will be sampled. And with the original config (as proposed in v3 of the
patch), this would still work the same way.
With the new approach (two min thresholds) it'd behave differently,
because we'd log *all* queries longer than 1000ms (not just 1%). And
whether we'd sample any queries (using log_statement_sample_rate) would
depend on how we'd pick the default value for the other threshold.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
I've started reviewing this patch, thinking that maybe I could get it
committed as it's marked as RFC. In general I agree with having this
fuature, but I think we need to rethink the GUC because the current
approach is just confusing.
...
What I think we should do instead is to use two minimum thresholds.
1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate
2) log_min_duration_statement - logs all commands exceeding this
I think this is going to be much easier for users to understand.
I agree with Tomas' idea.
The one difference between those approaches is in how they work with
existing current settings. That is, let's say you have
log_min_duration_statement = 1000
log_statement_sample_rate = 0.01
then no queries below 1000ms will be logged, and 1% of longer queries
will be sampled. And with the original config (as proposed in v3 of the
patch), this would still work the same way.
With the new approach (two min thresholds) it'd behave differently,
because we'd log *all* queries longer than 1000ms (not just 1%). And
whether we'd sample any queries (using log_statement_sample_rate) would
depend on how we'd pick the default value for the other threshold.
Well, we do not need to have a backwards-compatibility problem
here, because we have yet to release a version containing
log_statement_sample_rate. I do not think it's too late to decide
that v12's semantics for that are broken, and either revert that
patch in v12, or back-patch a fix to make it match this idea.
regards, tom lane
On Tue, Jul 30, 2019 at 03:43:58PM -0400, Tom Lane wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
I've started reviewing this patch, thinking that maybe I could get it
committed as it's marked as RFC. In general I agree with having this
fuature, but I think we need to rethink the GUC because the current
approach is just confusing.
...
What I think we should do instead is to use two minimum thresholds.
1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate
2) log_min_duration_statement - logs all commands exceeding this
I think this is going to be much easier for users to understand.I agree with Tomas' idea.
The one difference between those approaches is in how they work with
existing current settings. That is, let's say you have
log_min_duration_statement = 1000
log_statement_sample_rate = 0.01
then no queries below 1000ms will be logged, and 1% of longer queries
will be sampled. And with the original config (as proposed in v3 of the
patch), this would still work the same way.
With the new approach (two min thresholds) it'd behave differently,
because we'd log *all* queries longer than 1000ms (not just 1%). And
whether we'd sample any queries (using log_statement_sample_rate) would
depend on how we'd pick the default value for the other threshold.Well, we do not need to have a backwards-compatibility problem
here, because we have yet to release a version containing
log_statement_sample_rate. I do not think it's too late to decide
that v12's semantics for that are broken, and either revert that
patch in v12, or back-patch a fix to make it match this idea.
I'm willing to try fixing this to salvage the feature for v12. The
question is how would that fix look like - IMO we'd need to introduce
the new threshold GUC, essentially implementing what this thread is
about. It's not a complex patch, but it kinda flies in the face of
feature freeze. OTOH if we call it a fix ...
The patch itself is not that complicated - attached is a WIP version,
(particularly) the docs need more work.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
log-sample-rework.patchtext/plain; charset=us-asciiDownload+81-11
On Tue, Jul 30, 2019 at 03:43:58PM -0400, Tom Lane wrote:
Well, we do not need to have a backwards-compatibility problem
here, because we have yet to release a version containing
log_statement_sample_rate. I do not think it's too late to decide
that v12's semantics for that are broken, and either revert that
patch in v12, or back-patch a fix to make it match this idea.
With my RTM hat on, if we think that the current semantics of
log_statement_sample_rate are broken and need a redesign, then I would
take the safest path and just revert the original patch in v12, and
finally make sure that it brews correctly for v13. We are in beta2
and close to a beta3, so redesigning things at this stage on a stable
branch sounds wrong.
--
Michael
On 7/28/19 12:19 AM, Tomas Vondra wrote:
Hi,
I've started reviewing this patch, thinking that maybe I could get it
committed as it's marked as RFC. In general I agree with having this
fuature, but I think we need to rethink the GUC because the current
approach is just confusing.The way the current patch works is that we have three GUCs:
log_min_duration_statement
log_statement_sample_limit
log_statement_sample_rateand it essentially works like this:
- If the duration exceeds log_min_duration_statement, we start sampling
the commands with log_statement_sample rate.- If the duration exceeds log_statement_sample_limit, we just log the
command every time (i.e. we disable sampling, using sample rate 1.0).IMO that's bound to be confusing for users, because one threshold
behaves as minimum while the other behaves as maximum.
I agree, it took me a while to understand how it behave with the three GUC. That
why I tried to enrich documentation, but it may mean that the functionality is
not properly implemented.
What I think we should do instead is to use two minimum thresholds.
1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate2) log_min_duration_statement - logs all commands exceeding this
I think this is going to be much easier for users to understand.
+1, I like this idea.
I don't really have an opinion if we have to revert the whole feature or try to
fix it for v12. But it is true it is a late to fix it.
Furthermore, users who really want this feature in v12 can use an extension for
that purpose [1].
1: I made this extension with same kind of features :
https://github.com/anayrat/pg_sampletolog
Hi,
As we are at the end of this CF and there is still discussions about whether we
should revert log_statement_sample_limit and log_statement_sample_rate, or try
to fix it in v12.
I moved this patch to next commit fest and change status from "ready for
commiter" to "need review". I hope I didn't make a mistake.
Best regards,
On Thu, Aug 01, 2019 at 11:47:46AM +0200, Adrien Nayrat wrote:
Hi,
As we are at the end of this CF and there is still discussions about whether we
should revert log_statement_sample_limit and log_statement_sample_rate, or try
to fix it in v12.
I moved this patch to next commit fest and change status from "ready for
commiter" to "need review". I hope I didn't make a mistake.
Thanks. The RFC status was clearly stale, so thanks for updating. I should
have done that after my review. I think the patch would be moved to the
next CF at the end, but I might be wrong. In any case, I don't think
you've done any mistake.
As for the sampling patch - I think we'll end up reverting the feature for
v12 - it's far too late to rework it at this point. Sorry about that, I
know it's not a warm feeling when you get something done, and then it gets
reverted on the last minute. :-(
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/1/19 12:04 PM, Tomas Vondra wrote:
On Thu, Aug 01, 2019 at 11:47:46AM +0200, Adrien Nayrat wrote:
Hi,
As we are at the end of this CF and there is still discussions about whether we
should revert log_statement_sample_limit and log_statement_sample_rate, or try
to fix it in v12.
I moved this patch to next commit fest and change status from "ready for
commiter" to "need review". I hope I didn't make a mistake.Thanks. The RFC status was clearly stale, so thanks for updating. I should
have done that after my review. I think the patch would be moved to the
next CF at the end, but I might be wrong. In any case, I don't think
you've done any mistake.As for the sampling patch - I think we'll end up reverting the feature for
v12 - it's far too late to rework it at this point. Sorry about that, I
know it's not a warm feeling when you get something done, and then it gets
reverted on the last minute. :-(
Don't worry, I understand. It is better to add straigforward GUC in v13 than
confusionning in v12 we will regret.
On Fri, Aug 02, 2019 at 09:53:40AM +0200, Adrien Nayrat wrote:
On 8/1/19 12:04 PM, Tomas Vondra wrote:
On Thu, Aug 01, 2019 at 11:47:46AM +0200, Adrien Nayrat wrote:
Hi,
As we are at the end of this CF and there is still discussions about whether we
should revert log_statement_sample_limit and log_statement_sample_rate, or try
to fix it in v12.
I moved this patch to next commit fest and change status from "ready for
commiter" to "need review". I hope I didn't make a mistake.Thanks. The RFC status was clearly stale, so thanks for updating. I should
have done that after my review. I think the patch would be moved to the
next CF at the end, but I might be wrong. In any case, I don't think
you've done any mistake.As for the sampling patch - I think we'll end up reverting the feature for
v12 - it's far too late to rework it at this point. Sorry about that, I
know it's not a warm feeling when you get something done, and then it gets
reverted on the last minute. :-(Don't worry, I understand. It is better to add straigforward GUC in v13 than
confusionning in v12 we will regret.
OK, I have the revert ready. The one thing I'm wondering about is
whether we need to revert log_transaction_sample_rate too? I think it's
pretty much independent feature, so I think we can keep that. Opinions?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services