auto_explain vs. parallel query

Started by Tomas Vondraabout 9 years ago10 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

While debugging something on 9.6, I've noticed that auto_explain handles
parallel queries in a slightly strange way - both the leader and all the
workers log their chunk of the query (i.e. the leader logs explain for
the whole query, while workers only log the parts they've been executing).

So for example for a query with 2 workers, the log look like this:

2016-10-31 23:10:23.481 CET [12278] LOG: duration: 32.562 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 140614 ...
2016-10-31 23:10:23.481 CET [12277] LOG: duration: 32.325 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 80948 ...
2016-10-31 23:10:23.483 CET [12259] LOG: duration: 38.997 ms pl ...
Query Text: explain analyze select * from tables where table_ ...
Gather (cost=1000.00..16211.15 rows=529 width=356) (actual t ...
Workers Planned: 2 ...
Workers Launched: 2 ...
-> Parallel Seq Scan on tables (cost=0.00..15158.25 rows= ...
Filter: ((table_datoid < '1000000000'::oid) AND (tabl ...
Rows Removed by Filter: 105570 ...

I'd say that's fairly surprising, and I haven't found any discussion
about auto_explain vs. parallel queries in the archives (within the last
year), so I assume this may not be entirely intentional.

Not only this does not match the output when running EXPLAIN ANALYZE
manually, it also provides no information about which messages from
workers belong to which "leader" message.

Another thing is that this interacts with sampling in a rather
unfortunate way, because each process evaluates the sampling condition
independently. So for example with auto_explain.sample_rate=0.5 a random
subset of worker/leader explain plans will be logged.

But this also applies to the conditions in ExecutorStart, which enables
instrumentation only when the query gets sampled - so when the leader
gets sampled but not all the workers, the counters in the query logged
by the leader are incomplete.

I do believe those are bugs that make auto_explain rather unusable with
parallel queries. Adding IsParallelWorker() to the condition in
ExecutorEnd should fix the extra logging messages (and log only from the
leader).

Not sure how to fix the sampling - simply adding IsParallelWorker() to
ExecutorStart won't do the trick, because we don't want to enable
instrumentation only for sample queries. So I guess this needs to be
decided in the leader, and communicated to the workers somehow.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#1)
Re: auto_explain vs. parallel query

On Mon, Oct 31, 2016 at 6:35 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

While debugging something on 9.6, I've noticed that auto_explain handles
parallel queries in a slightly strange way - both the leader and all the
workers log their chunk of the query (i.e. the leader logs explain for the
whole query, while workers only log the parts they've been executing).

So for example for a query with 2 workers, the log look like this:

2016-10-31 23:10:23.481 CET [12278] LOG: duration: 32.562 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 140614 ...
2016-10-31 23:10:23.481 CET [12277] LOG: duration: 32.325 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 80948 ...
2016-10-31 23:10:23.483 CET [12259] LOG: duration: 38.997 ms pl ...
Query Text: explain analyze select * from tables where table_ ...
Gather (cost=1000.00..16211.15 rows=529 width=356) (actual t ...
Workers Planned: 2 ...
Workers Launched: 2 ...
-> Parallel Seq Scan on tables (cost=0.00..15158.25 rows= ...
Filter: ((table_datoid < '1000000000'::oid) AND (tabl ...
Rows Removed by Filter: 105570 ...

I'd say that's fairly surprising, and I haven't found any discussion about
auto_explain vs. parallel queries in the archives (within the last year), so
I assume this may not be entirely intentional.

Yeah, that's not entirely intentional. :-(

Not only this does not match the output when running EXPLAIN ANALYZE
manually, it also provides no information about which messages from workers
belong to which "leader" message.

Another thing is that this interacts with sampling in a rather unfortunate
way, because each process evaluates the sampling condition independently. So
for example with auto_explain.sample_rate=0.5 a random subset of
worker/leader explain plans will be logged.

But this also applies to the conditions in ExecutorStart, which enables
instrumentation only when the query gets sampled - so when the leader gets
sampled but not all the workers, the counters in the query logged by the
leader are incomplete.

I do believe those are bugs that make auto_explain rather unusable with
parallel queries. Adding IsParallelWorker() to the condition in ExecutorEnd
should fix the extra logging messages (and log only from the leader).

Not sure how to fix the sampling - simply adding IsParallelWorker() to
ExecutorStart won't do the trick, because we don't want to enable
instrumentation only for sample queries. So I guess this needs to be decided
in the leader, and communicated to the workers somehow.

Yes, deciding in the leader and communicating to workers seems like
the way to go, but I'm not sure how simple that's going to be to
implement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#2)
1 attachment(s)
Re: auto_explain vs. parallel query

On 11/01/2016 02:15 PM, Robert Haas wrote:

On Mon, Oct 31, 2016 at 6:35 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

While debugging something on 9.6, I've noticed that auto_explain handles
parallel queries in a slightly strange way - both the leader and all the
workers log their chunk of the query (i.e. the leader logs explain for the
whole query, while workers only log the parts they've been executing).

So for example for a query with 2 workers, the log look like this:

2016-10-31 23:10:23.481 CET [12278] LOG: duration: 32.562 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 140614 ...
2016-10-31 23:10:23.481 CET [12277] LOG: duration: 32.325 ms pl ...
Query Text: <parallel query> ...
Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
Rows Removed by Filter: 80948 ...
2016-10-31 23:10:23.483 CET [12259] LOG: duration: 38.997 ms pl ...
Query Text: explain analyze select * from tables where table_ ...
Gather (cost=1000.00..16211.15 rows=529 width=356) (actual t ...
Workers Planned: 2 ...
Workers Launched: 2 ...
-> Parallel Seq Scan on tables (cost=0.00..15158.25 rows= ...
Filter: ((table_datoid < '1000000000'::oid) AND (tabl ...
Rows Removed by Filter: 105570 ...

I'd say that's fairly surprising, and I haven't found any discussion about
auto_explain vs. parallel queries in the archives (within the last year), so
I assume this may not be entirely intentional.

Yeah, that's not entirely intentional. :-(

Not only this does not match the output when running EXPLAIN ANALYZE
manually, it also provides no information about which messages from workers
belong to which "leader" message.

Another thing is that this interacts with sampling in a rather unfortunate
way, because each process evaluates the sampling condition independently. So
for example with auto_explain.sample_rate=0.5 a random subset of
worker/leader explain plans will be logged.

But this also applies to the conditions in ExecutorStart, which enables
instrumentation only when the query gets sampled - so when the leader gets
sampled but not all the workers, the counters in the query logged by the
leader are incomplete.

I do believe those are bugs that make auto_explain rather unusable with
parallel queries. Adding IsParallelWorker() to the condition in ExecutorEnd
should fix the extra logging messages (and log only from the leader).

Not sure how to fix the sampling - simply adding IsParallelWorker() to
ExecutorStart won't do the trick, because we don't want to enable
instrumentation only for sample queries. So I guess this needs to be decided
in the leader, and communicated to the workers somehow.

Yes, deciding in the leader and communicating to workers seems like
the way to go, but I'm not sure how simple that's going to be to
implement.

Clearly we need to pass some information to the worker processes, so
that they know whether to instrument the query or not. I don't know if
there's a good non-invasive way to do that from an extension - the
easiest way I can think of is using a bit of shared memory to pass the
"sample query" flag - attached is a patch that does that, and it seems
to be working fine for me.

The bad thing is that this requires auto_explain to be loaded from
shared_preload_libraries, because it needs to allocate the chunk of
shared memory. That's quite annoying, because auto_explain is often
useful for ad-hoc investigations, so being able to load it on the fly is
valuable. But perhaps DSM would fix this (sorry, my knowledge of the DSM
infrastructure is limited)?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

auto_explain_parallel.patchtext/x-diff; name=auto_explain_parallel.patchDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 4ccd2aa..a9e87bd 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -14,8 +14,13 @@
 
 #include <limits.h>
 
+#include "access/parallel.h"
 #include "commands/explain.h"
 #include "executor/instrument.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 
 PG_MODULE_MAGIC;
@@ -47,6 +52,7 @@ static ExecutorStart_hook_type prev_ExecutorStart = NULL;
 static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 
 /* Is the current query sampled, per backend */
 static bool current_query_sampled = true;
@@ -65,6 +71,17 @@ static void explain_ExecutorRun(QueryDesc *queryDesc,
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
+typedef struct AutoExplainState
+{
+	LWLock	   *lock;		/* protects shared statr */
+	bool		sampled;	/* should we sample this query? */
+} AutoExplainState;
+
+static AutoExplainState *explainState;
+
+static void explain_shmem_startup(void);
+static bool is_query_sampled(void);
+static void set_query_sampled(bool sample);
 
 /*
  * Module load callback
@@ -72,6 +89,9 @@ static void explain_ExecutorEnd(QueryDesc *queryDesc);
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		elog(ERROR, "auto_explain can only be loaded via shared_preload_libraries");
+
 	/* Define custom GUC variables. */
 	DefineCustomIntVariable("auto_explain.log_min_duration",
 		 "Sets the minimum execution time above which plans will be logged.",
@@ -176,6 +196,8 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	RequestAddinShmemSpace(sizeof(AutoExplainState));
+	RequestNamedLWLockTranche("auto_explain", 1);
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -187,6 +209,9 @@ _PG_init(void)
 	ExecutorFinish_hook = explain_ExecutorFinish;
 	prev_ExecutorEnd = ExecutorEnd_hook;
 	ExecutorEnd_hook = explain_ExecutorEnd;
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = explain_shmem_startup;
 }
 
 /*
@@ -200,6 +225,7 @@ _PG_fini(void)
 	ExecutorRun_hook = prev_ExecutorRun;
 	ExecutorFinish_hook = prev_ExecutorFinish;
 	ExecutorEnd_hook = prev_ExecutorEnd;
+	shmem_startup_hook = prev_shmem_startup_hook;
 }
 
 /*
@@ -212,11 +238,14 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * For rate sampling, randomly choose top-level statement. Either all
 	 * nested statements will be explained or none will.
 	 */
-	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0 && (! IsParallelWorker()))
+	{
 		current_query_sampled = (random() < auto_explain_sample_rate *
 								 MAX_RANDOM_VALUE);
+		set_query_sampled(current_query_sampled);
+	}
 
-	if (auto_explain_enabled() && current_query_sampled)
+	if (auto_explain_enabled() && is_query_sampled())
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
 		if (auto_explain_log_analyze && (eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0)
@@ -235,7 +264,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	else
 		standard_ExecutorStart(queryDesc, eflags);
 
-	if (auto_explain_enabled() && current_query_sampled)
+	if (auto_explain_enabled() && is_query_sampled())
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -305,7 +334,7 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 static void
 explain_ExecutorEnd(QueryDesc *queryDesc)
 {
-	if (queryDesc->totaltime && auto_explain_enabled() && current_query_sampled)
+	if (queryDesc->totaltime && auto_explain_enabled() && (! IsParallelWorker()))
 	{
 		double		msec;
 
@@ -366,3 +395,57 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 	else
 		standard_ExecutorEnd(queryDesc);
 }
+
+static void
+explain_shmem_startup(void)
+{
+	bool		found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	explainState = NULL;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+
+	explainState = ShmemInitStruct("auto_explain",
+						   sizeof(AutoExplainState),
+						   &found);
+	if (!found)
+	{
+		/* First time through ... */
+		explainState->lock = &(GetNamedLWLockTranche("auto_explain"))->lock;
+		explainState->sampled = true;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+static bool
+is_query_sampled(void)
+{
+	bool sampled;
+
+	/* in leader we can just check the global variable */
+	if (! IsParallelWorker())
+		return current_query_sampled;
+
+	/* in worker processes we need to get the info from shared memory */
+	LWLockAcquire(explainState->lock, LW_SHARED);
+	sampled = explainState->sampled;
+	LWLockRelease(explainState->lock);
+
+	return sampled;
+}
+
+static void
+set_query_sampled(bool sample)
+{
+	/* the decisions should only be made in leader */
+	Assert(! IsParallelWorker());
+
+	/* in worker processes we need to get the info from shared memory */
+	LWLockAcquire(explainState->lock, LW_EXCLUSIVE);
+	explainState->sampled = sample;
+	LWLockRelease(explainState->lock);
+}
#4Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#3)
Re: auto_explain vs. parallel query

On Tue, Nov 1, 2016 at 10:21 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Clearly we need to pass some information to the worker processes, so that
they know whether to instrument the query or not. I don't know if there's a
good non-invasive way to do that from an extension - the easiest way I can
think of is using a bit of shared memory to pass the "sample query" flag -
attached is a patch that does that, and it seems to be working fine for me.

Uh, isn't this going to break as soon as there are multiple parallel
queries in process at the same time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#4)
1 attachment(s)
Re: auto_explain vs. parallel query

On 11/01/2016 03:29 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:21 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Clearly we need to pass some information to the worker processes, so that
they know whether to instrument the query or not. I don't know if there's a
good non-invasive way to do that from an extension - the easiest way I can
think of is using a bit of shared memory to pass the "sample query" flag -
attached is a patch that does that, and it seems to be working fine for me.

Uh, isn't this going to break as soon as there are multiple parallel
queries in process at the same time?

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

auto_explain_parallel-v2.patchtext/x-diff; name=auto_explain_parallel-v2.patchDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 4ccd2aa..192a3a1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -14,8 +14,13 @@
 
 #include <limits.h>
 
+#include "access/parallel.h"
 #include "commands/explain.h"
 #include "executor/instrument.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
 #include "utils/guc.h"
 
 PG_MODULE_MAGIC;
@@ -47,10 +52,13 @@ static ExecutorStart_hook_type prev_ExecutorStart = NULL;
 static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 
 /* Is the current query sampled, per backend */
 static bool current_query_sampled = true;
 
+static int parent;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
 	 (nesting_level == 0 || auto_explain_log_nested_statements))
@@ -65,6 +73,18 @@ static void explain_ExecutorRun(QueryDesc *queryDesc,
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
+typedef struct AutoExplainState
+{
+	LWLock	   *lock;							/* protects shared statr */
+	bool		sampled[FLEXIBLE_ARRAY_MEMBER];	/* should we sample this query? */
+} AutoExplainState;
+
+static AutoExplainState *explainState;
+
+static void explain_shmem_startup(void);
+static bool is_query_sampled(void);
+static void set_query_sampled(bool sample);
+static Size explain_shmem_size(void);
 
 /*
  * Module load callback
@@ -72,6 +92,9 @@ static void explain_ExecutorEnd(QueryDesc *queryDesc);
 void
 _PG_init(void)
 {
+	if (!process_shared_preload_libraries_in_progress)
+		elog(ERROR, "auto_explain can only be loaded via shared_preload_libraries");
+
 	/* Define custom GUC variables. */
 	DefineCustomIntVariable("auto_explain.log_min_duration",
 		 "Sets the minimum execution time above which plans will be logged.",
@@ -176,6 +199,8 @@ _PG_init(void)
 							 NULL,
 							 NULL);
 
+	RequestAddinShmemSpace(explain_shmem_size());
+	RequestNamedLWLockTranche("auto_explain", 1);
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -187,6 +212,15 @@ _PG_init(void)
 	ExecutorFinish_hook = explain_ExecutorFinish;
 	prev_ExecutorEnd = ExecutorEnd_hook;
 	ExecutorEnd_hook = explain_ExecutorEnd;
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = explain_shmem_startup;
+}
+
+static Size
+explain_shmem_size(void)
+{
+	return offsetof(AutoExplainState, sampled) + MaxConnections * sizeof(bool);
 }
 
 /*
@@ -200,6 +234,7 @@ _PG_fini(void)
 	ExecutorRun_hook = prev_ExecutorRun;
 	ExecutorFinish_hook = prev_ExecutorFinish;
 	ExecutorEnd_hook = prev_ExecutorEnd;
+	shmem_startup_hook = prev_shmem_startup_hook;
 }
 
 /*
@@ -208,15 +243,20 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	parent = MyBackendId;
+
 	/*
 	 * For rate sampling, randomly choose top-level statement. Either all
 	 * nested statements will be explained or none will.
 	 */
-	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0 && (! IsParallelWorker()))
+	{
 		current_query_sampled = (random() < auto_explain_sample_rate *
 								 MAX_RANDOM_VALUE);
+		set_query_sampled(current_query_sampled);
+	}
 
-	if (auto_explain_enabled() && current_query_sampled)
+	if (auto_explain_enabled() && is_query_sampled())
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
 		if (auto_explain_log_analyze && (eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0)
@@ -235,7 +275,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	else
 		standard_ExecutorStart(queryDesc, eflags);
 
-	if (auto_explain_enabled() && current_query_sampled)
+	if (auto_explain_enabled() && is_query_sampled())
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -305,7 +345,7 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 static void
 explain_ExecutorEnd(QueryDesc *queryDesc)
 {
-	if (queryDesc->totaltime && auto_explain_enabled() && current_query_sampled)
+	if (queryDesc->totaltime && auto_explain_enabled() && (! IsParallelWorker()))
 	{
 		double		msec;
 
@@ -366,3 +406,57 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 	else
 		standard_ExecutorEnd(queryDesc);
 }
+
+static void
+explain_shmem_startup(void)
+{
+	bool		found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	explainState = NULL;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+
+	explainState = ShmemInitStruct("auto_explain",
+						   sizeof(AutoExplainState),
+						   &found);
+	if (!found)
+	{
+		/* First time through ... */
+		explainState->lock = &(GetNamedLWLockTranche("auto_explain"))->lock;
+		memset(explainState->sampled, 1, sizeof(bool) * MaxConnections);
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+static bool
+is_query_sampled(void)
+{
+	bool sampled;
+
+	/* in leader we can just check the global variable */
+	if (! IsParallelWorker())
+		return current_query_sampled;
+
+	/* in worker processes we need to get the info from shared memory */
+	LWLockAcquire(explainState->lock, LW_SHARED);
+	sampled = explainState->sampled[ParallelMasterBackendId];
+	LWLockRelease(explainState->lock);
+
+	return sampled;
+}
+
+static void
+set_query_sampled(bool sample)
+{
+	/* the decisions should only be made in leader */
+	Assert(! IsParallelWorker());
+
+	/* in worker processes we need to get the info from shared memory */
+	LWLockAcquire(explainState->lock, LW_EXCLUSIVE);
+	explainState->sampled[MyBackendId] = sample;
+	LWLockRelease(explainState->lock);
+}
#6Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#5)
Re: auto_explain vs. parallel query

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.

I feel like it isn't. I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand. Also, if we do solve it this
way, surely we don't need the locking. The flag's only set before any
workers have started and never changes thereafter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: auto_explain vs. parallel query

On 11/01/2016 08:32 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.

I feel like it isn't. I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand. Also, if we do solve it this
way, surely we don't need the locking. The flag's only set before any
workers have started and never changes thereafter.

I'm not sure what you mean by "DSM for that parallel query" - I thought
the segments are created for Gather nodes, no? Or is there a DSM for the
whole query that we could use?

Another thing is that maybe we don't really want to give extensions
access to any of those segments - my impression was those segments are
considered internal (is there RequestAddinShmemSpace for them?). And
hacking something just for auto_explain seems a big ugly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#7)
Re: auto_explain vs. parallel query

On Wed, Nov 2, 2016 at 12:49 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 11/01/2016 08:32 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.

I feel like it isn't. I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand. Also, if we do solve it this
way, surely we don't need the locking. The flag's only set before any
workers have started and never changes thereafter.

I'm not sure what you mean by "DSM for that parallel query" - I thought the
segments are created for Gather nodes, no? Or is there a DSM for the whole
query that we could use?

Sure, the Gather node creates it. There's generally only one per
query, though, and that's how most information is communicated from
leader to workers.

Another thing is that maybe we don't really want to give extensions access
to any of those segments - my impression was those segments are considered
internal (is there RequestAddinShmemSpace for them?). And hacking something
just for auto_explain seems a big ugly.

Yeah. I thought that there wouldn't be any reason for third-party
code to need to get into these segments, but maybe that was
short-sighted of me. If we fix this without that, then we've got to
force pg_stat_statements to be loaded through
shared_preload_libarries, as you mentioned, and that doesn't seem nice
either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#8)
Re: auto_explain vs. parallel query

On 11/03/2016 03:59 PM, Robert Haas wrote:

On Wed, Nov 2, 2016 at 12:49 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 11/01/2016 08:32 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.

I feel like it isn't. I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand. Also, if we do solve it this
way, surely we don't need the locking. The flag's only set before any
workers have started and never changes thereafter.

I'm not sure what you mean by "DSM for that parallel query" - I thought the
segments are created for Gather nodes, no? Or is there a DSM for the whole
query that we could use?

Sure, the Gather node creates it. There's generally only one per
query, though, and that's how most information is communicated from
leader to workers.

Ah, right. I haven't realized there's just a single Gather per query.

Another thing is that maybe we don't really want to give extensions access
to any of those segments - my impression was those segments are considered
internal (is there RequestAddinShmemSpace for them?). And hacking something
just for auto_explain seems a big ugly.

Yeah. I thought that there wouldn't be any reason for third-party
code to need to get into these segments, but maybe that was
short-sighted of me. If we fix this without that, then we've got to
force pg_stat_statements to be loaded through
shared_preload_libarries, as you mentioned, and that doesn't seem nice
either.

Well, I was talking about auto_explain, and pg_stat_statements already
has to be loaded through shared_preload_libraries. I'm not sure how many
other extensions would need similar hack (I don't see any in contrib,
but there may be external ones).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#9)
Re: auto_explain vs. parallel query

On Thu, Nov 3, 2016 at 12:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Sure, the Gather node creates it. There's generally only one per
query, though, and that's how most information is communicated from
leader to workers.

Ah, right. I haven't realized there's just a single Gather per query.

That's not a hard and fast rule; it just usually works out that way.

Yeah. I thought that there wouldn't be any reason for third-party
code to need to get into these segments, but maybe that was
short-sighted of me. If we fix this without that, then we've got to
force pg_stat_statements to be loaded through
shared_preload_libarries, as you mentioned, and that doesn't seem nice
either.

Well, I was talking about auto_explain, and pg_stat_statements already has
to be loaded through shared_preload_libraries. I'm not sure how many other
extensions would need similar hack (I don't see any in contrib, but there
may be external ones).

Sorry, auto_explain, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers