auto_explain sample rate

Started by Craig Ringerover 10 years ago32 messages
#1Craig Ringer
craig@2ndquadrant.com

Hi all

It's sometimes desirable to collect auto_explain data with ANALYZE in order
to track down hard-to-reproduce issues, but the performance impacts can be
pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries, and also add a sample test hook that can be
used to target statements of interest more narrowly (using a C hook
function).

Sound like a reasonable approach?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#1)
Re: auto_explain sample rate

Craig Ringer <craig@2ndquadrant.com> writes:

It's sometimes desirable to collect auto_explain data with ANALYZE in order
to track down hard-to-reproduce issues, but the performance impacts can be
pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries,

That sounds reasonable ...

and also add a sample test hook that can be
used to target statements of interest more narrowly (using a C hook
function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that. Can't we invent something a bit more user-friendly
for the purpose? No idea what it should look like though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: auto_explain sample rate

On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig@2ndquadrant.com> writes:

It's sometimes desirable to collect auto_explain data with ANALYZE in

order

to track down hard-to-reproduce issues, but the performance impacts can

be

pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.

and also add a sample test hook that can be
used to target statements of interest more narrowly (using a C hook
function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that. Can't we invent something a bit more user-friendly
for the purpose? No idea what it should look like though.

I've been that desperate.

For the majority of users I'm sure it's sufficient to just have a sample
rate.

Anything that's trying to match individual queries could be interested in
all sorts of different things. Queries that touch a particular table being
one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Craig Ringer (#3)
Re: auto_explain sample rate

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig@2ndquadrant.com> writes:

It's sometimes desirable to collect auto_explain data with ANALYZE in

order

to track down hard-to-reproduce issues, but the performance impacts can

be

pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.

and also add a sample test hook that can be
used to target statements of interest more narrowly (using a C hook
function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that. Can't we invent something a bit more user-friendly
for the purpose? No idea what it should look like though.

I've been that desperate.

For the majority of users I'm sure it's sufficient to just have a sample
rate.

Anything that's trying to match individual queries could be interested in
all sorts of different things. Queries that touch a particular table being
one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING
has about 20%-30% overhead and usually we don't need a perfectly exact
numbers

Regards

Pavel

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Pavel Stehule (#4)
Re: auto_explain sample rate

On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample
rate.

Anything that's trying to match individual queries could be interested in
all sorts of different things. Queries that touch a particular table being
one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING
has about 20%-30% overhead and usually we don't need a perfectly exact
numbers

I don't understand what you are suggesting here.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Craig Ringer (#5)
Re: auto_explain sample rate

2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample
rate.

Anything that's trying to match individual queries could be interested
in all sorts of different things. Queries that touch a particular table
being one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too.
TIMING has about 20%-30% overhead and usually we don't need a perfectly
exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement

Pavel

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Pavel Stehule (#6)
Re: auto_explain sample rate

On 3 June 2015 at 15:22, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a
sample rate.

Anything that's trying to match individual queries could be interested
in all sorts of different things. Queries that touch a particular table
being one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too.
TIMING has about 20%-30% overhead and usually we don't need a perfectly
exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement

Do you mean that you'd like to be able to set a fraction of queries on
which auto_explain does ANALYZE, so most of the time it just outputs an
ordinary EXPLAIN?

Or maybe we're talking about different things re the original proposal? I
don't see how this would work. If you run EXPLAIN ANALYZE interactively
like you said above. You'd surely want it to report costs and timings, or
whatever it is that you ask for, all the time. Not just some of the time
based on some background setting.

Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing
where we sample which executor node we're under at regular intervals,
instead of timing everything? Or suggesting a way to filter out sub-trees
so you only get timing data on some sub-portion of a query?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Craig Ringer (#7)
Re: auto_explain sample rate

2015-06-03 9:46 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 3 June 2015 at 15:22, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a
sample rate.

Anything that's trying to match individual queries could be interested
in all sorts of different things. Queries that touch a particular table
being one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too.
TIMING has about 20%-30% overhead and usually we don't need a perfectly
exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement

Do you mean that you'd like to be able to set a fraction of queries on
which auto_explain does ANALYZE, so most of the time it just outputs an
ordinary EXPLAIN?

Or maybe we're talking about different things re the original proposal? I
don't see how this would work. If you run EXPLAIN ANALYZE interactively
like you said above. You'd surely want it to report costs and timings, or
whatever it is that you ask for, all the time. Not just some of the time
based on some background setting.

Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing
where we sample which executor node we're under at regular intervals,
instead of timing everything? Or suggesting a way to filter out sub-trees
so you only get timing data on some sub-portion of a query?

lot of variants - I would to see cost and times for EXPLAIN ANALYZE every
time - but the precision of time can be reduced to 1ms. It is question if
we can significantly reduce the cost (or number of calls) of getting system
time.

Pavel

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Craig Ringer
craig@2ndquadrant.com
In reply to: Pavel Stehule (#8)
Re: auto_explain sample rate

lot of variants - I would to see cost and times for EXPLAIN ANALYZE every

time - but the precision of time can be reduced to 1ms. It is question if
we can significantly reduce the cost (or number of calls) of getting system
time.

Pavel

OK, so you're suggesting a sampling-based EXPLAIN.

That'd be interesting, but is totally unrelated to this work on
auto_explain.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#3)
1 attachment(s)
Re: auto_explain sample rate

On 2 June 2015 at 15:07, Craig Ringer <craig@2ndquadrant.com> wrote:

On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig@2ndquadrant.com> writes:

It's sometimes desirable to collect auto_explain data with ANALYZE in

order

to track down hard-to-reproduce issues, but the performance impacts can

be

pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.

OK, here we go.

To make sure it doesn't trigger on all backends at once, and to ensure it
doesn't rely on a shared point of contention in shmem, this sets up a
counter with a random value on each backend start.

Because it needs to either always run both the Start and End hooks, or run
neither, this doesn't count nested statements for sampling purposes. So if
you run my_huge_plpgsql_function() then either all its statements will be
explained or none of them will. This only applies if nested statement
explain is enabled. It's possible to get around this by adding a separate
nested statement counter that's reset at each top level End hook, but it
doesn't seem worthwhile.

The sample rate has no effect on ANALYZE, which remains enabled or disabled
for all queries. I don't see any point adding a separate sample rate
control to ANALYZE only some sub-proportion of EXPLAINed statements.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Tra

Attachments:

0001-Allow-sampling-of-only-some-queries-by-auto_explain.patchtext/x-patch; charset=US-ASCII; name=0001-Allow-sampling-of-only-some-queries-by-auto_explain.patchDownload
From 1f395e11bad41a77881970c7179f65e33c523cdd Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Wed, 3 Jun 2015 17:48:02 +0800
Subject: [PATCH] Allow sampling of only some queries by auto_explain

The new parameter auto_explain.sample_ratio specifies the proportion
of queries that are to be explained.  The starting offset is randomized
so that somewhere between the 1st and n'th statement gets explained,
then every n'th statement. No statement classification is performed.

If nested statement auto_explain is enabled, then for any given
top level statement either all sub-statements or none will be explained,
i.e. sampling doesn't apply to nested statements.
---
 contrib/auto_explain/auto_explain.c | 56 ++++++++++++++++++++++++++++++++++++-
 doc/src/sgml/auto-explain.sgml      | 18 ++++++++++++
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 2a184ed..161e24e 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,6 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
+static int  auto_explain_sample_ratio = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -47,9 +48,14 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 
+/* per-backend counter used for ratio sampling */
+static int  auto_explain_sample_counter = 0;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
-	 (nesting_level == 0 || auto_explain_log_nested_statements))
+	 (nesting_level == 0 || auto_explain_log_nested_statements)) && \
+	 (auto_explain_sample_ratio == 1 || auto_explain_sample_counter == 0)
+
 
 void		_PG_init(void);
 void		_PG_fini(void);
@@ -61,6 +67,19 @@ static void explain_ExecutorRun(QueryDesc *queryDesc,
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
+static void
+auto_explain_sample_ratio_assign_hook(int newval, void *extra)
+{
+	if (auto_explain_sample_ratio != newval)
+	{
+		/* Schedule a counter reset when the sample ratio changed */
+		auto_explain_sample_counter = -1;
+	}
+
+	auto_explain_sample_ratio = newval;
+}
+
+
 
 /*
  * Module load callback
@@ -159,6 +178,18 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	DefineCustomIntVariable("auto_explain.sample_ratio",
+		"Only explain one in approx. every sample_ratio queries, or 1 for all",
+							NULL,
+							&auto_explain_sample_ratio,
+							1,
+							1, INT_MAX - 1,
+							PGC_SUSET,
+							0,
+							NULL,
+							auto_explain_sample_ratio_assign_hook,
+							NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -191,6 +222,29 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * For ratio sampling, only increment the counter for top-level
+	 * statements. Either all nested statements will be explained
+	 * or none will, because we need to know at ExecutorEnd hook time
+	 * whether or not we explained any given statement.
+	 */
+	if (nesting_level == 0 && auto_explain_sample_ratio > 1)
+	{
+		if (auto_explain_sample_counter == -1)
+		{
+			/*
+			 * First time the hook ran in this backend, seed the counter. This
+			 * might be zero and cause the first statement to be explained.
+			 */
+			auto_explain_sample_counter = pg_lrand48() % auto_explain_sample_ratio;
+		}
+		else if ((++auto_explain_sample_counter) == auto_explain_sample_ratio)
+		{
+			/* Wrap the counter and explain this statement */
+			auto_explain_sample_counter = 0;
+		}
+	}
+
 	if (auto_explain_enabled())
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index d527208..fbb1220 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -203,6 +203,24 @@ LOAD 'auto_explain';
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term>
+     <varname>auto_explain.sample_ratio</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      <varname>auto_explain.sample_ratio</varname> causes auto_explain to only
+      explain every n'th statement in each session.  The number of statements
+      before the first is explained is between 1 and n, randomized to ensure
+      that short sessions still sometimes get explained. Only superusers can
+      change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
-- 
2.1.0

#11Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#10)
Re: auto_explain sample rate

On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:

OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#11)
Re: auto_explain sample rate

On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de> wrote:

On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:

OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Could be, yeah. I was thinking about the cost of generating a random each
time, but it's going to vanish in the noise compared to the rest of the
costs in query execution.

---
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Craig Ringer (#12)
Re: auto_explain sample rate

On 03/06/2015 15:00, Craig Ringer wrote:

On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:

OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Could be, yeah. I was thinking about the cost of generating a random
each time, but it's going to vanish in the noise compared to the rest of
the costs in query execution.

Hello, I've just reviewed the patch.

I'm not sure if there's a consensus on the sample rate format. FWIW, I
also think a fraction would be easier to understand. Any news about
generating a random at each call to avoid the query pattern problem ?

The patch applies without error. I wonder if there's any reason for
using pg_lrand48() instead of random(), as there's a port for random()
if the system lacks it.

After some quick checks, I found that auto_explain_sample_counter is
always initialized with the same value. After some digging, it seems
that pg_lrand48() always returns the same values in the same order, at
least on my computer. Have I missed something?

Otherwise, after replacing the pg_lrand48() call with a random(), it
works just fine.

---
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Julien Rouhaud (#13)
1 attachment(s)
Re: auto_explain sample rate

On 05/07/2015 18:22, Julien Rouhaud wrote:

On 03/06/2015 15:00, Craig Ringer wrote:

On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:

OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Could be, yeah. I was thinking about the cost of generating a random
each time, but it's going to vanish in the noise compared to the rest of
the costs in query execution.

Hello, I've just reviewed the patch.

I'm not sure if there's a consensus on the sample rate format. FWIW, I
also think a fraction would be easier to understand. Any news about
generating a random at each call to avoid the query pattern problem ?

The patch applies without error. I wonder if there's any reason for
using pg_lrand48() instead of random(), as there's a port for random()
if the system lacks it.

After some quick checks, I found that auto_explain_sample_counter is
always initialized with the same value. After some digging, it seems
that pg_lrand48() always returns the same values in the same order, at
least on my computer. Have I missed something?

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise. So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.

Otherwise, after replacing the pg_lrand48() call with a random(), it
works just fine.

---
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

auto_explain_sample_rate-v2.patchtext/x-patch; name=auto_explain_sample_rate-v2.patchDownload
*** a/contrib/auto_explain/auto_explain.c
--- b/contrib/auto_explain/auto_explain.c
***************
*** 29,34 **** static bool auto_explain_log_triggers = false;
--- 29,35 ----
  static bool auto_explain_log_timing = true;
  static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
  static bool auto_explain_log_nested_statements = false;
+ static int  auto_explain_sample_ratio = 1;
  
  static const struct config_enum_entry format_options[] = {
  	{"text", EXPLAIN_FORMAT_TEXT, false},
***************
*** 47,55 **** static ExecutorRun_hook_type prev_ExecutorRun = NULL;
  static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
  static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
  
  #define auto_explain_enabled() \
  	(auto_explain_log_min_duration >= 0 && \
! 	 (nesting_level == 0 || auto_explain_log_nested_statements))
  
  void		_PG_init(void);
  void		_PG_fini(void);
--- 48,61 ----
  static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
  static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
  
+ /* per-backend counter used for ratio sampling */
+ static int  auto_explain_sample_counter = 0;
+ 
  #define auto_explain_enabled() \
  	(auto_explain_log_min_duration >= 0 && \
! 	 (nesting_level == 0 || auto_explain_log_nested_statements)) && \
! 	 (auto_explain_sample_ratio == 1 || auto_explain_sample_counter == 0)
! 
  
  void		_PG_init(void);
  void		_PG_fini(void);
***************
*** 61,66 **** static void explain_ExecutorRun(QueryDesc *queryDesc,
--- 67,85 ----
  static void explain_ExecutorFinish(QueryDesc *queryDesc);
  static void explain_ExecutorEnd(QueryDesc *queryDesc);
  
+ static void
+ auto_explain_sample_ratio_assign_hook(int newval, void *extra)
+ {
+ 	if (auto_explain_sample_ratio != newval)
+ 	{
+ 		/* Schedule a counter reset when the sample ratio changed */
+ 		auto_explain_sample_counter = -1;
+ 	}
+ 
+ 	auto_explain_sample_ratio = newval;
+ }
+ 
+ 
  
  /*
   * Module load callback
***************
*** 159,164 **** _PG_init(void)
--- 178,195 ----
  							 NULL,
  							 NULL);
  
+ 	DefineCustomIntVariable("auto_explain.sample_ratio",
+ 		"Only explain one in approx. every sample_ratio queries, or 1 for all",
+ 							NULL,
+ 							&auto_explain_sample_ratio,
+ 							1,
+ 							1, INT_MAX - 1,
+ 							PGC_SUSET,
+ 							0,
+ 							NULL,
+ 							auto_explain_sample_ratio_assign_hook,
+ 							NULL);
+ 
  	EmitWarningsOnPlaceholders("auto_explain");
  
  	/* Install hooks. */
***************
*** 191,196 **** _PG_fini(void)
--- 222,250 ----
  static void
  explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
  {
+ 	/*
+ 	 * For ratio sampling, only increment the counter for top-level
+ 	 * statements. Either all nested statements will be explained
+ 	 * or none will, because we need to know at ExecutorEnd hook time
+ 	 * whether or not we explained any given statement.
+ 	 */
+ 	if (nesting_level == 0 && auto_explain_sample_ratio > 1)
+ 	{
+ 		if (auto_explain_sample_counter == -1)
+ 		{
+ 			/*
+ 			 * First time the hook ran in this backend, seed the counter. This
+ 			 * might be zero and cause the first statement to be explained.
+ 			 */
+ 			auto_explain_sample_counter = random() % auto_explain_sample_ratio;
+ 		}
+ 		else if ((++auto_explain_sample_counter) == auto_explain_sample_ratio)
+ 		{
+ 			/* Wrap the counter and explain this statement */
+ 			auto_explain_sample_counter = 0;
+ 		}
+ 	}
+ 
  	if (auto_explain_enabled())
  	{
  		/* Enable per-node instrumentation iff log_analyze is required. */
*** a/doc/src/sgml/auto-explain.sgml
--- b/doc/src/sgml/auto-explain.sgml
***************
*** 203,208 **** LOAD 'auto_explain';
--- 203,226 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term>
+      <varname>auto_explain.sample_ratio</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+      </indexterm>
+     </term>
+     <listitem>
+      <para>
+       <varname>auto_explain.sample_ratio</varname> causes auto_explain to only
+       explain every n'th statement in each session.  The number of statements
+       before the first is explained is between 1 and n, randomized to ensure
+       that short sessions still sometimes get explained. Only superusers can
+       change this setting.
+      </para>
+     </listitem>
+    </varlistentry>
    </variablelist>
  
    <para>
#15Craig Ringer
craig@2ndquadrant.com
In reply to: Julien Rouhaud (#14)
Re: auto_explain sample rate

On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise. So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael.paquier@gmail.com
In reply to: Craig Ringer (#15)
Re: auto_explain sample rate

On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise. So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.

For now I am marking that as returned with feedback.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Michael Paquier (#16)
1 attachment(s)
Re: auto_explain sample rate

On 25/08/2015 14:45, Michael Paquier wrote:

On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise. So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.

For now I am marking that as returned with feedback.

PFA v3 of the patch, rebased on current head. It fixes the last issue
(sample a percentage of queries).

I'm adding it to the next commitfest.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

auto_explain_sample_rate-v3.patchtext/x-patch; name=auto_explain_sample_rate-v3.patchDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0950e18..4edb08e 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,6 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
+static double auto_explain_sample_ratio = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -47,9 +48,14 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 
+/* Is the current query sampled, per backend */
+static bool current_query_sampled = true;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
-	 (nesting_level == 0 || auto_explain_log_nested_statements))
+	 (nesting_level == 0 || auto_explain_log_nested_statements) && \
+	 current_query_sampled)
+
 
 void		_PG_init(void);
 void		_PG_fini(void);
@@ -62,6 +68,7 @@ static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
 
+
 /*
  * Module load callback
  */
@@ -159,6 +166,19 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	DefineCustomRealVariable("auto_explain.sample_ratio",
+							 "Percentage of queries to process.",
+							NULL,
+							&auto_explain_sample_ratio,
+							1.0,
+							0.0,
+							1.0,
+							PGC_SUSET,
+							0,
+							NULL,
+							NULL,
+							NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -191,6 +211,14 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+				MAX_RANDOM_VALUE);
+
 	if (auto_explain_enabled())
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index d527208..9d40e65 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -203,6 +203,23 @@ LOAD 'auto_explain';
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term>
+     <varname>auto_explain.sample_ratio</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      <varname>auto_explain.sample_ratio</varname> causes auto_explain to only
+      explain X percent statements in each session.  In case of nested
+      statements, either all will be explained or none. Only superusers can
+      change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Julien Rouhaud (#17)
Re: auto_explain sample rate

Julien Rouhaud wrote:

Hijacking this macro is just too obscure:

#define auto_explain_enabled() \
(auto_explain_log_min_duration >= 0 && \
-	 (nesting_level == 0 || auto_explain_log_nested_statements))
+	 (nesting_level == 0 || auto_explain_log_nested_statements) && \
+	 current_query_sampled)

because it then becomes hard to figure out that assigning to _sampled is
what makes the enabled() check pass or not depending on sampling:

@@ -191,6 +211,14 @@ _PG_fini(void)
static void
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+				MAX_RANDOM_VALUE);
+
if (auto_explain_enabled())
{

I think it's better to keep the "enabled" macro unmodified, and just add
another conditional to the "if" test there.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Alvaro Herrera (#18)
1 attachment(s)
Re: auto_explain sample rate

On 16/02/2016 22:51, Alvaro Herrera wrote:

Julien Rouhaud wrote:

Hijacking this macro is just too obscure:

#define auto_explain_enabled() \
(auto_explain_log_min_duration >= 0 && \
-	 (nesting_level == 0 || auto_explain_log_nested_statements))
+	 (nesting_level == 0 || auto_explain_log_nested_statements) && \
+	 current_query_sampled)

because it then becomes hard to figure out that assigning to _sampled is
what makes the enabled() check pass or not depending on sampling:

@@ -191,6 +211,14 @@ _PG_fini(void)
static void
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+				MAX_RANDOM_VALUE);
+
if (auto_explain_enabled())
{

I think it's better to keep the "enabled" macro unmodified, and just add
another conditional to the "if" test there.

Thanks for looking at this!

Agreed, it's too obscure. Attached v4 fixes as you said.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

auto_explain_sample_rate-v4.patchtext/x-patch; name=auto_explain_sample_rate-v4.patchDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0950e18..fa564fd 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,6 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
+static double auto_explain_sample_ratio = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -47,10 +48,14 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 
+/* Is the current query sampled, per backend */
+static bool current_query_sampled = true;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
 	 (nesting_level == 0 || auto_explain_log_nested_statements))
 
+
 void		_PG_init(void);
 void		_PG_fini(void);
 
@@ -62,6 +67,7 @@ static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
 
+
 /*
  * Module load callback
  */
@@ -159,6 +165,19 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	DefineCustomRealVariable("auto_explain.sample_ratio",
+							 "Percentage of queries to process.",
+							NULL,
+							&auto_explain_sample_ratio,
+							1.0,
+							0.0,
+							1.0,
+							PGC_SUSET,
+							0,
+							NULL,
+							NULL,
+							NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -191,7 +210,15 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
-	if (auto_explain_enabled())
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+				MAX_RANDOM_VALUE);
+
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
 		if (auto_explain_log_analyze && (eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0)
@@ -210,7 +237,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	else
 		standard_ExecutorStart(queryDesc, eflags);
 
-	if (auto_explain_enabled())
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -280,7 +307,7 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 static void
 explain_ExecutorEnd(QueryDesc *queryDesc)
 {
-	if (queryDesc->totaltime && auto_explain_enabled())
+	if (queryDesc->totaltime && auto_explain_enabled() && current_query_sampled)
 	{
 		double		msec;
 
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index d527208..9d40e65 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -203,6 +203,23 @@ LOAD 'auto_explain';
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term>
+     <varname>auto_explain.sample_ratio</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      <varname>auto_explain.sample_ratio</varname> causes auto_explain to only
+      explain X percent statements in each session.  In case of nested
+      statements, either all will be explained or none. Only superusers can
+      change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
#20Petr Jelinek
petr@2ndquadrant.com
In reply to: Julien Rouhaud (#19)
Re: auto_explain sample rate

On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Petr Jelinek (#20)
1 attachment(s)
Re: auto_explain sample rate

On 10/03/2016 04:37, Petr Jelinek wrote:

On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.

Agreed. v5 attached fixes that.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

auto_explain_sample_rate-v5.difftext/x-patch; name=auto_explain_sample_rate-v5.diffDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0950e18..2755fd1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,6 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
+static double auto_explain_sample_ratio = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -47,10 +48,14 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 
+/* Is the current query sampled, per backend */
+static bool current_query_sampled = true;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
 	 (nesting_level == 0 || auto_explain_log_nested_statements))
 
+
 void		_PG_init(void);
 void		_PG_fini(void);
 
@@ -62,6 +67,7 @@ static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
 
+
 /*
  * Module load callback
  */
@@ -159,6 +165,19 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	DefineCustomRealVariable("auto_explain.sample_ratio",
+							 "Fraction of queries to process.",
+							NULL,
+							&auto_explain_sample_ratio,
+							1.0,
+							0.0,
+							1.0,
+							PGC_SUSET,
+							0,
+							NULL,
+							NULL,
+							NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -191,7 +210,15 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
-	if (auto_explain_enabled())
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+				MAX_RANDOM_VALUE);
+
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
 		if (auto_explain_log_analyze && (eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0)
@@ -210,7 +237,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	else
 		standard_ExecutorStart(queryDesc, eflags);
 
-	if (auto_explain_enabled())
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -280,7 +307,7 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 static void
 explain_ExecutorEnd(QueryDesc *queryDesc)
 {
-	if (queryDesc->totaltime && auto_explain_enabled())
+	if (queryDesc->totaltime && auto_explain_enabled() && current_query_sampled)
 	{
 		double		msec;
 
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index d527208..978ef35 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -203,6 +203,24 @@ LOAD 'auto_explain';
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term>
+     <varname>auto_explain.sample_ratio</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      <varname>auto_explain.sample_ratio</varname> (<type>floating point</type>)
+      causes auto_explain to only explain a fraction of the statements in each
+      session.  The default is 1, meaning explaining all the queries.  In case
+      of nested statements, either all will be explained or none. Only
+      superusers can change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
#22Petr Jelinek
petr@2ndquadrant.com
In reply to: Julien Rouhaud (#21)
Re: auto_explain sample rate

On 10/03/16 20:59, Julien Rouhaud wrote:

On 10/03/2016 04:37, Petr Jelinek wrote:

On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.

Agreed. v5 attached fixes that.

Great, I will test it once more (just because when I don't bugs suddenly
appear out of nowhere) and mark it ready for committer.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Magnus Hagander
magnus@hagander.net
In reply to: Petr Jelinek (#22)
Re: auto_explain sample rate

On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 10/03/16 20:59, Julien Rouhaud wrote:

On 10/03/2016 04:37, Petr Jelinek wrote:

On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.

Agreed. v5 attached fixes that.

Great, I will test it once more (just because when I don't bugs suddenly
appear out of nowhere) and mark it ready for committer.

Coming back to the previous discussions about random() - AFAICT this patch
will introduce the random() call always (in explain_ExecutorStart):

+ if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+ current_query_sampled = (random() < auto_explain_sample_ratio *
+ MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a check
for current_query_sampled>0 in the if part of that statement? Regardless
of performance, that feels cleaner to me. Or am I missing something?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#24Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Magnus Hagander (#23)
Re: auto_explain sample rate

On 11/03/2016 11:45, Magnus Hagander wrote:

Coming back to the previous discussions about random() - AFAICT this
patch will introduce the random() call always (in explain_ExecutorStart):

+if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a
check for current_query_sampled>0 in the if part of that statement?
Regardless of performance, that feels cleaner to me. Or am I missing
something?

You mean check for auto_explain_sample_ratio > 0 ?

There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#24)
Re: auto_explain sample rate

On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien.rouhaud@dalibo.com>
wrote:

On 11/03/2016 11:45, Magnus Hagander wrote:

Coming back to the previous discussions about random() - AFAICT this
patch will introduce the random() call always (in explain_ExecutorStart):

+if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a
check for current_query_sampled>0 in the if part of that statement?
Regardless of performance, that feels cleaner to me. Or am I missing
something?

You mean check for auto_explain_sample_ratio > 0 ?

I did, but I think what I should have meant is auto_explain_sample_ratio <
1.

There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

There would have to be an else block as well of course, that set it back.

We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

I guess in the end it probably is - a general random() call is pretty cheap
compared to all the things we're doing. I think my mind was stuck in
crypto-random which can be more expensive :)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#26Petr Jelinek
petr@2ndquadrant.com
In reply to: Magnus Hagander (#23)
Re: auto_explain sample rate

On 11/03/16 11:45, Magnus Hagander wrote:

On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:

On 10/03/16 20:59, Julien Rouhaud wrote:

On 10/03/2016 04:37, Petr Jelinek wrote:

On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents
fraction of
the queries sampled, because right now both the GUC
description and the
documentation say it's in percent but that's not really true
as percent
is 0 to 100.

Agreed. v5 attached fixes that.

Great, I will test it once more (just because when I don't bugs
suddenly appear out of nowhere) and mark it ready for committer.

Coming back to the previous discussions about random() - AFAICT this
patch will introduce the random() call always (in explain_ExecutorStart):

+if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);

No it doesn't as the documented way to turn off auto_explain is to set
auto_explain_log_min_duration to -1 which is also the default.

In any case the code for that would have to be something like
if (auto_explain_sample_ratio == 0)
current_query_sampled = false
else if <the original if>

Not sure if I consider that cleaner but it would definitely remove the
call to random() in case user has set auto_explain_log_min_duration to
something else than -1 and "turned off" the auto_explain by setting
auto_explain_sample_ratio to 0.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#25)
Re: auto_explain sample rate

On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien.rouhaud@dalibo.com

wrote:

On 11/03/2016 11:45, Magnus Hagander wrote:

Coming back to the previous discussions about random() - AFAICT this
patch will introduce the random() call always (in

explain_ExecutorStart):

+if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a
check for current_query_sampled>0 in the if part of that statement?
Regardless of performance, that feels cleaner to me. Or am I missing
something?

You mean check for auto_explain_sample_ratio > 0 ?

I did, but I think what I should have meant is auto_explain_sample_ratio <
1.

There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

There would have to be an else block as well of course, that set it back.

We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

I guess in the end it probably is - a general random() call is pretty
cheap compared to all the things we're doing. I think my mind was stuck in
crypto-random which can be more expensive :)

Applied with a minor word-fix in the documentation and removal of some
unrelated whitespace changes.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#28Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Magnus Hagander (#27)
Re: auto_explain sample rate

On 11/03/2016 15:11, Magnus Hagander wrote:

On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus@hagander.net
<mailto:magnus@hagander.net>> wrote:

On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:

On 11/03/2016 11:45, Magnus Hagander wrote:

Coming back to the previous discussions about random() - AFAICT this
patch will introduce the random() call always (in explain_ExecutorStart):

+if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a
check for current_query_sampled>0 in the if part of that statement?
Regardless of performance, that feels cleaner to me. Or am I missing
something?

You mean check for auto_explain_sample_ratio > 0 ?

I did, but I think what I should have meant is
auto_explain_sample_ratio < 1.

There would be a corner case if a query is sampled
(current_query_sampled is true) and then
auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

There would have to be an else block as well of course, that set it
back.

We could add a specific test for this case to spare a random()
call, but
I fear it'd be overkill. Maybe it's better to document that the
good way
to deactivate auto_explain is to set
auto_explain.log_min_duration to -1.

I guess in the end it probably is - a general random() call is
pretty cheap compared to all the things we're doing. I think my mind
was stuck in crypto-random which can be more expensive :)

Applied with a minor word-fix in the documentation and removal of some
unrelated whitespace changes.

Thanks!

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#27)
Re: auto_explain sample rate

Hi,

On Fri, 2016-03-11 at 15:11 +0100, Magnus Hagander wrote:

 

Applied with a minor word-fix in the documentation and removal of
some unrelated whitespace changes. 

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#29)
Re: auto_explain sample rate

On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

I like that idea. It seems like slightly better terminology.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Robert Haas (#30)
1 attachment(s)
Re: auto_explain sample rate

On 11/03/2016 17:55, Robert Haas wrote:

On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

I like that idea. It seems like slightly better terminology.

I like it too. I also just noticed that I duplicated the var type by
mistake in the documentation :/

Attached patch fixes both.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

auto_explain_rate.difftext/plain; charset=UTF-8; name=auto_explain_rate.diffDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 76d1831..55529af 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,7 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
-static double auto_explain_sample_ratio = 1;
+static double auto_explain_sample_rate = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -163,10 +163,10 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
-	DefineCustomRealVariable("auto_explain.sample_ratio",
+	DefineCustomRealVariable("auto_explain.sample_rate",
 							 "Fraction of queries to process.",
 							NULL,
-							&auto_explain_sample_ratio,
+							&auto_explain_sample_rate,
 							1.0,
 							0.0,
 							1.0,
@@ -209,11 +209,11 @@ static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
 	/*
-	 * For ratio sampling, randomly choose top-level statement. Either
+	 * For rate sampling, randomly choose top-level statement. Either
 	 * all nested statements will be explained or none will.
 	 */
 	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
-		current_query_sampled = (random() < auto_explain_sample_ratio *
+		current_query_sampled = (random() < auto_explain_sample_rate *
 				MAX_RANDOM_VALUE);
 
 	if (auto_explain_enabled() && current_query_sampled)
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 6f1bde0..38e6f50 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -206,18 +206,17 @@ LOAD 'auto_explain';
 
    <varlistentry>
     <term>
-     <varname>auto_explain.sample_ratio</varname> (<type>real</type>)
+     <varname>auto_explain.sample_rate</varname> (<type>real</type>)
      <indexterm>
-      <primary><varname>auto_explain.sample_ratio</> configuration parameter</primary>
+      <primary><varname>auto_explain.sample_rate</> configuration parameter</primary>
      </indexterm>
     </term>
     <listitem>
      <para>
-      <varname>auto_explain.sample_ratio</varname> (<type>floating point</type>)
-      causes auto_explain to only explain a fraction of the statements in each
-      session.  The default is 1, meaning explain all the queries.  In case
-      of nested statements, either all will be explained or none. Only
-      superusers can change this setting.
+      <varname>auto_explain.sample_rate</varname> causes auto_explain to only
+      explain a fraction of the statements in each session.  The default is 1,
+      meaning explain all the queries.  In case of nested statements, either all
+      will be explained or none. Only superusers can change this setting.
      </para>
     </listitem>
    </varlistentry>
#32Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#31)
Re: auto_explain sample rate

On Sat, Mar 12, 2016 at 3:20 PM, Julien Rouhaud <julien.rouhaud@dalibo.com>
wrote:

On 11/03/2016 17:55, Robert Haas wrote:

On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

I like that idea. It seems like slightly better terminology.

I like it too. I also just noticed that I duplicated the var type by
mistake in the documentation :/

Attached patch fixes both.

I also agree, and thanks for doing the work :) Applied!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/