crashes due to setting max_parallel_workers=0
Hi,
while working on a patch I ran into some crashes that seem to be caused
by inconsistent handling of max_parallel_workers - queries still seem to
be planned with parallel plans enabled, but then crash at execution.
The attached script reproduces the issue on a simple query, causing
crashed in GatherMerge, but I assume the issue is more general.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On Sat, Mar 25, 2017 at 7:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Hi,
while working on a patch I ran into some crashes that seem to be caused by
inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.The attached script reproduces the issue on a simple query, causing crashed
in GatherMerge, but I assume the issue is more general.
I could see other parallel plans like Gather also picked at
max_parallel_workers=0. I suspect multiple issues here and this needs
some investigation. I have added this to open items list.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
while working on a patch I ran into some crashes that seem to be caused by
inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.The attached script reproduces the issue on a simple query, causing crashed
in GatherMerge, but I assume the issue is more general.
I had a look at this and found a bunch of off by 1 errors in the code.
I've attached a couple of patches, one is the minimal fix, and one
more complete one.
In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
gather_merge_minimal_fix.patchapplication/octet-stream; name=gather_merge_minimal_fix.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 72f30ab..f4613f7 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -369,7 +369,7 @@ gather_merge_init(GatherMergeState *gm_state)
gm_state->gm_tuple_buffers =
(GMReaderTupleBuffer *) palloc0(sizeof(GMReaderTupleBuffer) *
(gm_state->nreaders + 1));
- for (i = 0; i < gm_state->nreaders; i++)
+ for (i = 0; i < gm_state->nreaders + 1; i++)
{
/* Allocate the tuple array with MAX_TUPLE_STORE size */
gm_state->gm_tuple_buffers[i].tuple =
@@ -412,7 +412,7 @@ reread:
}
initialize = false;
- for (i = 0; i < nreaders; i++)
+ for (i = 0; i < nreaders + 1; i++)
if (!gm_state->gm_tuple_buffers[i].done &&
(TupIsNull(gm_state->gm_slots[i]) ||
gm_state->gm_slots[i]->tts_isempty))
@@ -431,7 +431,7 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
{
int i;
- for (i = 0; i < gm_state->nreaders; i++)
+ for (i = 0; i < gm_state->nreaders + 1; i++)
{
pfree(gm_state->gm_tuple_buffers[i].tuple);
gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
gather_merge_fix.patchapplication/octet-stream; name=gather_merge_fix.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 72f30ab..3413c9a 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -199,9 +199,9 @@ ExecGatherMerge(GatherMergeState *node)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- node->nreaders = 0;
- node->reader = palloc(pcxt->nworkers_launched *
- sizeof(TupleQueueReader *));
+ node->reader = (TupleQueueReader **)
+ palloc(pcxt->nworkers_launched *
+ sizeof(TupleQueueReader *));
Assert(gm->numCols);
@@ -209,7 +209,7 @@ ExecGatherMerge(GatherMergeState *node)
{
shm_mq_set_handle(node->pei->tqueue[i],
pcxt->worker[i].bgwhandle);
- node->reader[node->nreaders++] =
+ node->reader[i] =
CreateTupleQueueReader(node->pei->tqueue[i],
node->tupDesc);
}
@@ -302,7 +302,7 @@ ExecShutdownGatherMergeWorkers(GatherMergeState *node)
{
int i;
- for (i = 0; i < node->nreaders; ++i)
+ for (i = 0; i < node->nworkers_launched; ++i)
if (node->reader[i])
DestroyTupleQueueReader(node->reader[i]);
@@ -348,28 +348,26 @@ ExecReScanGatherMerge(GatherMergeState *node)
static void
gather_merge_init(GatherMergeState *gm_state)
{
- int nreaders = gm_state->nreaders;
+ int nslots = gm_state->nworkers_launched + 1;
bool initialize = true;
int i;
/*
* Allocate gm_slots for the number of worker + one more slot for leader.
- * Last slot is always for leader. Leader always calls ExecProcNode() to
- * read the tuple which will return the TupleTableSlot. Later it will
- * directly get assigned to gm_slot. So just initialize leader gm_slot
- * with NULL. For other slots below code will call
- * ExecInitExtraTupleSlot() which will do the initialization of worker
- * slots.
+ * The final slot in the array is reserved for the leader process. This
+ * slot is always populated via ExecProcNode(). This can be set to NULL
+ * for now. The remaining slots we'll initialize with a call to
+ * ExecInitExtraTupleSlot().
*/
- gm_state->gm_slots =
- palloc((gm_state->nreaders + 1) * sizeof(TupleTableSlot *));
- gm_state->gm_slots[gm_state->nreaders] = NULL;
+ gm_state->gm_slots = (TupleTableSlot **)
+ palloc(nslots * sizeof(TupleTableSlot *));
+ gm_state->gm_slots[nslots - 1] = NULL; /* nullify leader's slot */
- /* Initialize the tuple slot and tuple array for each worker */
+ /* Initialize the tuple slot and tuple array for each reader */
gm_state->gm_tuple_buffers =
- (GMReaderTupleBuffer *) palloc0(sizeof(GMReaderTupleBuffer) *
- (gm_state->nreaders + 1));
- for (i = 0; i < gm_state->nreaders; i++)
+ (GMReaderTupleBuffer *) palloc0(sizeof(GMReaderTupleBuffer) * nslots);
+
+ for (i = 0; i < nslots; i++)
{
/* Allocate the tuple array with MAX_TUPLE_STORE size */
gm_state->gm_tuple_buffers[i].tuple =
@@ -382,7 +380,7 @@ gather_merge_init(GatherMergeState *gm_state)
}
/* Allocate the resources for the merge */
- gm_state->gm_heap = binaryheap_allocate(gm_state->nreaders + 1,
+ gm_state->gm_heap = binaryheap_allocate(nslots,
heap_compare_slots,
gm_state);
@@ -392,10 +390,10 @@ gather_merge_init(GatherMergeState *gm_state)
* leader. After this, if all active workers are unable to produce a
* tuple, then re-read and this time use wait mode. For workers that were
* able to produce a tuple in the earlier loop and are still active, just
- * try to fill the tuple array if more tuples are avaiable.
+ * try to fill the tuple array if more tuples are available.
*/
reread:
- for (i = 0; i < nreaders + 1; i++)
+ for (i = 0; i < nslots; i++)
{
if (!gm_state->gm_tuple_buffers[i].done &&
(TupIsNull(gm_state->gm_slots[i]) ||
@@ -412,7 +410,7 @@ reread:
}
initialize = false;
- for (i = 0; i < nreaders; i++)
+ for (i = 0; i < nslots; i++)
if (!gm_state->gm_tuple_buffers[i].done &&
(TupIsNull(gm_state->gm_slots[i]) ||
gm_state->gm_slots[i]->tts_isempty))
@@ -429,9 +427,10 @@ reread:
static TupleTableSlot *
gather_merge_clear_slots(GatherMergeState *gm_state)
{
+ int nslots = gm_state->nworkers_launched + 1;
int i;
- for (i = 0; i < gm_state->nreaders; i++)
+ for (i = 0; i < nslots; i++)
{
pfree(gm_state->gm_tuple_buffers[i].tuple);
gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
@@ -499,13 +498,15 @@ gather_merge_getnext(GatherMergeState *gm_state)
static void
form_tuple_array(GatherMergeState *gm_state, int reader)
{
- GMReaderTupleBuffer *tuple_buffer = &gm_state->gm_tuple_buffers[reader];
+ GMReaderTupleBuffer *tuple_buffer;
int i;
/* Last slot is for leader and we don't build tuple array for leader */
- if (reader == gm_state->nreaders)
+ if (reader == gm_state->nworkers_launched)
return;
+ tuple_buffer = &gm_state->gm_tuple_buffers[reader];
+
/*
* We here because we already read all the tuples from the tuple array, so
* initialize the counter to zero.
@@ -544,7 +545,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
* If we're being asked to generate a tuple from the leader, then we
* just call ExecProcNode as normal to produce one.
*/
- if (gm_state->nreaders == reader)
+ if (gm_state->nworkers_launched == reader)
{
if (gm_state->need_to_scan_locally)
{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..56375a1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2109,18 +2109,20 @@ typedef struct GatherMergeState
PlanState ps; /* its first field is NodeTag */
bool initialized;
struct ParallelExecutorInfo *pei;
- int nreaders;
- int nworkers_launched;
- struct TupleQueueReader **reader;
+ int nworkers_launched; /* number of parallel workers launched */
+ struct TupleQueueReader **reader; /* array of readers, nworkers_launched
+ * long */
TupleDesc tupDesc;
- TupleTableSlot **gm_slots;
- struct binaryheap *gm_heap; /* binary heap of slot indices */
+ TupleTableSlot **gm_slots; /* array of Tuple slots, nworkers_launched + 1
+ * long */
+ struct binaryheap *gm_heap; /* binary heap of slot indices */
bool gm_initialized; /* gather merge initilized ? */
bool need_to_scan_locally;
int gm_nkeys;
SortSupport gm_sortkeys; /* array of length ms_nkeys */
- struct GMReaderTupleBuffer *gm_tuple_buffers; /* tuple buffer per
- * reader */
+ struct GMReaderTupleBuffer *gm_tuple_buffers; /* array of tuple buffers,
+ * nworkers_launched + 1
+ * long */
} GatherMergeState;
/* ----------------
Thanks Tomas for reporting issue and Thanks David for working
on this.
I can see the problem in GatherMerge, specially when nworkers_launched
is zero. I will look into this issue and will post a fix for the same.
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set
the max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.
Thanks,
On Sat, Mar 25, 2017 at 2:21 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:while working on a patch I ran into some crashes that seem to be caused
by
inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.The attached script reproduces the issue on a simple query, causing
crashed
in GatherMerge, but I assume the issue is more general.
I had a look at this and found a bunch of off by 1 errors in the code.
I've attached a couple of patches, one is the minimal fix, and one
more complete one.In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Rushabh Lathia
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.
I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.
I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;
My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.
If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.
... thinks for a bit....
perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.
I think force_parallel_mode=regress should test the same thing.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 March 2017 at 00:17, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.I think force_parallel_mode=regress should test the same thing.
Not really. That flag's days are surely numbered. It creates a Gather
node, the problem was with GatherMerge.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/25/17 09:01, David Rowley wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.
Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
On 3/25/17 09:01, David Rowley wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.
Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).
PFA simple patch to fix the problem.
Thanks,
Rushabh Lathia
www.Enterprisedb.com
Attachments:
gm_nworkers_launched_zero.patchtext/x-patch; charset=US-ASCII; name=gm_nworkers_launched_zero.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 72f30ab..e19bace 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -431,9 +431,14 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
{
int i;
- for (i = 0; i < gm_state->nreaders; i++)
+ for (i = 0; i < gm_state->nreaders + 1; i++)
{
- pfree(gm_state->gm_tuple_buffers[i].tuple);
+ /*
+ * Gather merge always allow leader to participate and we don't
+ * allocate the tuple, so no need to free the tuple for leader.
+ */
+ if (i != gm_state->nreaders)
+ pfree(gm_state->gm_tuple_buffers[i].tuple);
gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
}
@@ -474,6 +479,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
*/
i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
+ Assert(i < gm_state->nreaders + 1);
+
if (gather_merge_readnext(gm_state, i, false))
binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i));
else
On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:On 3/25/17 09:01, David Rowley wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).PFA simple patch to fix the problem.
I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.
Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on
some other query.
But it seems a bit futile to produce the parallel plan in the first
place, because with max_parallel_workers=0 we can't possibly get any
parallel workers ever. I wonder why compute_parallel_worker() only looks
at max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.
Regarding handling this at the GUC level - I agree with Peter that
that's not a good idea. I suppose we could deal with checking the values
in the GUC check/assign hooks, but what we don't have is a way to undo
the changes in all the GUCs. That is, if I do
SET max_parallel_workers = 0;
SET max_parallel_workers = 16;
I expect to end up with just max_parallel_workers GUC changed and
nothing else.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/25/2017 02:01 PM, David Rowley wrote:
I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;
It's demonstrably a valid way to disable parallel queries (i.e. there's
nothing wrong with it), because the docs say this:
Setting this value to 0 disables parallel query execution.
My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.... thinks for a bit....
perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...
I agree it'd be very useful to have a more where we generate parallel
plans but then prohibit starting any workers. That would detect this and
similar issues, I think.
I'm not sure we need to invent a new magic value, though. Can we simply
look at force_parallel_mode, and if it's 'regress' then tread 0 differently?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 March 2017 at 10:23, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I'm not sure we need to invent a new magic value, though. Can we simply look
at force_parallel_mode, and if it's 'regress' then tread 0 differently?
see standard_planner()
if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe)
{
Gather *gather = makeNode(Gather);
Probably force_parallel_mode is good for testing the tuple queue code,
and some code in Gather, but I'm not quite sure what else its good
for. Certainly not GatherMerge.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:On 3/25/17 09:01, David Rowley wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com
<mailto:rushabh.lathia@gmail.com>> wrote:
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather pathwith
max_parallel_worker = 0.
I see that it was actually quite useful that it works the way it
does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust
against
cases of related settings being nonsensical.Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).PFA simple patch to fix the problem.
I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on some
other query.
Thanks.
But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:parallel_workers = Min(parallel_workers, max_parallel_workers);
I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:
/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));
Perhaps this was discussed and is actually intentional, though.
Yes, I am not quite sure about this.
Regarding handling this at the GUC level - I agree with Peter that that's
not a good idea. I suppose we could deal with checking the values in the
GUC check/assign hooks, but what we don't have is a way to undo the changes
in all the GUCs. That is, if I doSET max_parallel_workers = 0;
SET max_parallel_workers = 16;I expect to end up with just max_parallel_workers GUC changed and nothing
else.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Rushabh Lathia
On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <
tomas.vondra@2ndquadrant.com> wrote:On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:On 3/25/17 09:01, David Rowley wrote:
On 25 March 2017 at 23:09, Rushabh Lathia <
rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:
Also another point which I think we should fix is, when someone
set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather pathwith
max_parallel_worker = 0.
I see that it was actually quite useful that it works the way it
does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust
against
cases of related settings being nonsensical.Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).PFA simple patch to fix the problem.
I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on some
other query.Thanks.
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.
create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;
This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.
Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.
But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:parallel_workers = Min(parallel_workers, max_parallel_workers);
I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));Perhaps this was discussed and is actually intentional, though.
Yes, I am not quite sure about this.
Regarding handling this at the GUC level - I agree with Peter that that's
not a good idea. I suppose we could deal with checking the values in the
GUC check/assign hooks, but what we don't have is a way to undo the changes
in all the GUCs. That is, if I doSET max_parallel_workers = 0;
SET max_parallel_workers = 16;I expect to end up with just max_parallel_workers GUC changed and nothing
else.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Rushabh Lathia
Regards,
Rushabh Lathia
www.EnterpriseDB.com
Attachments:
gm_nworkers_launched_zero_v2.patchtext/x-patch; charset=US-ASCII; name=gm_nworkers_launched_zero_v2.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 3f0c3ee..62c399e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -419,10 +419,9 @@ reread:
}
/*
- * Clear out the tuple table slots for each gather merge input,
- * and return a cleared slot.
+ * Clear out the tuple table slots for each gather merge input.
*/
-static TupleTableSlot *
+static void
gather_merge_clear_slots(GatherMergeState *gm_state)
{
int i;
@@ -437,9 +436,6 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
pfree(gm_state->gm_tuple_buffers);
/* Free the binaryheap, which was created for sort */
binaryheap_free(gm_state->gm_heap);
-
- /* return any clear slot */
- return gm_state->gm_slots[0];
}
/*
@@ -479,7 +475,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
if (binaryheap_empty(gm_state->gm_heap))
{
/* All the queues are exhausted, and so is the heap */
- return gather_merge_clear_slots(gm_state);
+ gather_merge_clear_slots(gm_state);
+ return NULL;
}
else
{
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.
Yes, I am not quite sure about this.
It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.com
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.
It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.com
Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.comSince this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.
Hey, imagine if the comments explained the logic behind the code!
Good idea. How about the attached?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
add-worker-comment.patchapplication/octet-stream; name=add-worker-comment.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fa7a5f8..3e7c472 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -268,6 +268,12 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
* to assume that the same serialization level will be in effect at plan
* time and execution time, so don't generate a parallel plan if we're in
* serializable mode.
+ *
+ * Note that we don't verify here whether max_parallel_workers > 0. If
+ * you set max_parallel_workers_per_gather > max_parallel_workers, you
+ * might get some plans that will never succeed in obtaining the number of
+ * workers for which they planned. If that bothers you, don't do it. It's
+ * actually quite useful to do that for testing purposes, though.
*/
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).
Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)
When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.
I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.
Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea. Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.
Good idea. How about the attached?
WFM ... but seems like there should be some flavor of this statement
in the user-facing docs too (ie, "max_parallel_workers_per_gather >
max_parallel_workers is a bad idea unless you're trying to test what
happens when a plan can't get all the workers it planned for"). The
existing text makes some vague allusions suggesting that the two
GUCs might be interrelated, but I think it could be improved.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.Good idea. How about the attached?
WFM ... but seems like there should be some flavor of this statement
in the user-facing docs too (ie, "max_parallel_workers_per_gather >
max_parallel_workers is a bad idea unless you're trying to test what
happens when a plan can't get all the workers it planned for"). The
existing text makes some vague allusions suggesting that the two
GUCs might be interrelated, but I think it could be improved.
Do you have a more specific idea? I mean, this seems like a
degenerate case of what the documentation for
max_parallel_workers_per_gather says already. Even if
max_parallel_workers_per_gather <= Min(max_worker_processes,
max_parallel_workers), it's quite possible that you'll regularly be
generating plans that can't obtain the budgeted number of workers.
The only thing that is really special about the case where
max_parallel_workers_per_gather > Min(max_worker_processes,
max_parallel_workers) is that this can happen even on an
otherwise-idle system. I'm not quite sure how to emphasize that
without seeming to state the obvious.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In thepatch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea.
Hmm I agree that it's good idea, and I will work on that as separate patch.
Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.
So in my second version of patch I change gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
On Mon, Mar 27, 2017 at 12:36 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
Hmm I agree that it's good idea, and I will work on that as separate patch.
Maybe you want to start with what David already posted?
Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.So in my second version of patch I change gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.
Ah, I missed that. That does seem cleaner. Anybody see a problem
with that approach?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/27/2017 05:51 PM, Robert Haas wrote:
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.comSince this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.Hey, imagine if the comments explained the logic behind the code!
Good idea. How about the attached?
Certainly an improvement. But perhaps we should also mention this at
compute_parallel_worker, i.e. that not looking at max_parallel_workers
is intentional.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.
Isn't that just another sign the code might be a bit too confusing? I
see two main issues in the code:
1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
way to cause off-by-one errors
2) mixing objects with different life spans (slots for readers vs. slot
for the leader) seems like a bad idea too
I wonder how much we gain by reusing the slot from the leader (I'd be
surprised if it was at all measurable). David posted a patch reworking
this, and significantly simplifying the GatherMerge node. Why not to
accept that?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com
wrote:
On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified
function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.Isn't that just another sign the code might be a bit too confusing? I see
two main issues in the code:1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
way to cause off-by-one errors2) mixing objects with different life spans (slots for readers vs. slot
for the leader) seems like a bad idea tooI wonder how much we gain by reusing the slot from the leader (I'd be
surprised if it was at all measurable). David posted a patch reworking
this, and significantly simplifying the GatherMerge node. Why not to accept
that?
I think we all agree that we should get rid of nreaders from the
GatherMergeState
and need to do some code re-factor. But if I understood correctly that
Robert's
concern was to do that re-factor as separate commit.
I picked David's patch and started reviewing the changes. I applied that
patch
on top of my v2 patch (which does the re-factor of
gather_merge_clear_slots).
In David's patch, into gather_merge_init(), a loop where tuple array is
getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.
Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).
Thanks,
--
Rushabh Lathia
www.EnterpriseDB.com
Attachments:
gather_merge_refactor.patchtext/x-patch; charset=US-ASCII; name=gather_merge_refactor.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 62c399e..2d7eb71 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -195,9 +195,9 @@ ExecGatherMerge(GatherMergeState *node)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- node->nreaders = 0;
- node->reader = palloc(pcxt->nworkers_launched *
- sizeof(TupleQueueReader *));
+ node->reader = (TupleQueueReader **)
+ palloc(pcxt->nworkers_launched *
+ sizeof(TupleQueueReader *));
Assert(gm->numCols);
@@ -205,7 +205,7 @@ ExecGatherMerge(GatherMergeState *node)
{
shm_mq_set_handle(node->pei->tqueue[i],
pcxt->worker[i].bgwhandle);
- node->reader[node->nreaders++] =
+ node->reader[i] =
CreateTupleQueueReader(node->pei->tqueue[i],
node->tupDesc);
}
@@ -298,7 +298,7 @@ ExecShutdownGatherMergeWorkers(GatherMergeState *node)
{
int i;
- for (i = 0; i < node->nreaders; ++i)
+ for (i = 0; i < node->nworkers_launched; ++i)
if (node->reader[i])
DestroyTupleQueueReader(node->reader[i]);
@@ -344,28 +344,26 @@ ExecReScanGatherMerge(GatherMergeState *node)
static void
gather_merge_init(GatherMergeState *gm_state)
{
- int nreaders = gm_state->nreaders;
+ int nslots = gm_state->nworkers_launched + 1;
bool initialize = true;
int i;
/*
* Allocate gm_slots for the number of worker + one more slot for leader.
- * Last slot is always for leader. Leader always calls ExecProcNode() to
- * read the tuple which will return the TupleTableSlot. Later it will
- * directly get assigned to gm_slot. So just initialize leader gm_slot
- * with NULL. For other slots below code will call
- * ExecInitExtraTupleSlot() which will do the initialization of worker
- * slots.
+ * The final slot in the array is reserved for the leader process. This
+ * slot is always populated via ExecProcNode(). This can be set to NULL
+ * for now. The remaining slots we'll initialize with a call to
+ * ExecInitExtraTupleSlot().
*/
- gm_state->gm_slots =
- palloc((gm_state->nreaders + 1) * sizeof(TupleTableSlot *));
- gm_state->gm_slots[gm_state->nreaders] = NULL;
+ gm_state->gm_slots = (TupleTableSlot **)
+ palloc(nslots * sizeof(TupleTableSlot *));
+ gm_state->gm_slots[nslots - 1] = NULL; /* nullify leader's slot */
- /* Initialize the tuple slot and tuple array for each worker */
+ /* Initialize the tuple slot and tuple array for each reader */
gm_state->gm_tuple_buffers =
- (GMReaderTupleBuffer *) palloc0(sizeof(GMReaderTupleBuffer) *
- (gm_state->nreaders + 1));
- for (i = 0; i < gm_state->nreaders; i++)
+ (GMReaderTupleBuffer *) palloc0(sizeof(GMReaderTupleBuffer) * nslots);
+
+ for (i = 0; i < gm_state->nworkers_launched; i++)
{
/* Allocate the tuple array with MAX_TUPLE_STORE size */
gm_state->gm_tuple_buffers[i].tuple =
@@ -378,7 +376,7 @@ gather_merge_init(GatherMergeState *gm_state)
}
/* Allocate the resources for the merge */
- gm_state->gm_heap = binaryheap_allocate(gm_state->nreaders + 1,
+ gm_state->gm_heap = binaryheap_allocate(nslots,
heap_compare_slots,
gm_state);
@@ -388,10 +386,10 @@ gather_merge_init(GatherMergeState *gm_state)
* leader. After this, if all active workers are unable to produce a
* tuple, then re-read and this time use wait mode. For workers that were
* able to produce a tuple in the earlier loop and are still active, just
- * try to fill the tuple array if more tuples are avaiable.
+ * try to fill the tuple array if more tuples are available.
*/
reread:
- for (i = 0; i < nreaders + 1; i++)
+ for (i = 0; i < nslots; i++)
{
if (!gm_state->gm_tuple_buffers[i].done &&
(TupIsNull(gm_state->gm_slots[i]) ||
@@ -408,7 +406,7 @@ reread:
}
initialize = false;
- for (i = 0; i < nreaders; i++)
+ for (i = 0; i < nslots; i++)
if (!gm_state->gm_tuple_buffers[i].done &&
(TupIsNull(gm_state->gm_slots[i]) ||
gm_state->gm_slots[i]->tts_isempty))
@@ -419,14 +417,14 @@ reread:
}
/*
- * Clear out the tuple table slots for each gather merge input.
+ * Clear out the tuple table slots for each gather merge workers.
*/
static void
gather_merge_clear_slots(GatherMergeState *gm_state)
{
int i;
- for (i = 0; i < gm_state->nreaders; i++)
+ for (i = 0; i < gm_state->nworkers_launched; i++)
{
pfree(gm_state->gm_tuple_buffers[i].tuple);
gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
@@ -492,13 +490,15 @@ gather_merge_getnext(GatherMergeState *gm_state)
static void
form_tuple_array(GatherMergeState *gm_state, int reader)
{
- GMReaderTupleBuffer *tuple_buffer = &gm_state->gm_tuple_buffers[reader];
+ GMReaderTupleBuffer *tuple_buffer;
int i;
/* Last slot is for leader and we don't build tuple array for leader */
- if (reader == gm_state->nreaders)
+ if (reader == gm_state->nworkers_launched)
return;
+ tuple_buffer = &gm_state->gm_tuple_buffers[reader];
+
/*
* We here because we already read all the tuples from the tuple array, so
* initialize the counter to zero.
@@ -537,7 +537,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
* If we're being asked to generate a tuple from the leader, then we
* just call ExecProcNode as normal to produce one.
*/
- if (gm_state->nreaders == reader)
+ if (gm_state->nworkers_launched == reader)
{
if (gm_state->need_to_scan_locally)
{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 11a6850..e8c08c6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1864,18 +1864,20 @@ typedef struct GatherMergeState
PlanState ps; /* its first field is NodeTag */
bool initialized;
struct ParallelExecutorInfo *pei;
- int nreaders;
- int nworkers_launched;
- struct TupleQueueReader **reader;
+ int nworkers_launched; /* number of parallel workers launched */
+ struct TupleQueueReader **reader; /* array of readers, nworkers_launched
+ * long */
TupleDesc tupDesc;
- TupleTableSlot **gm_slots;
- struct binaryheap *gm_heap; /* binary heap of slot indices */
+ TupleTableSlot **gm_slots; /* array of Tuple slots, nworkers_launched + 1
+ * long */
+ struct binaryheap *gm_heap; /* binary heap of slot indices */
bool gm_initialized; /* gather merge initilized ? */
bool need_to_scan_locally;
int gm_nkeys;
SortSupport gm_sortkeys; /* array of length ms_nkeys */
- struct GMReaderTupleBuffer *gm_tuple_buffers; /* tuple buffer per
- * reader */
+ struct GMReaderTupleBuffer *gm_tuple_buffers; /* array of tuple buffers,
+ * nworkers_launched + 1
+ * long */
} GatherMergeState;
/* ----------------
gm_nworkers_launched_zero_v2.patchtext/x-patch; charset=US-ASCII; name=gm_nworkers_launched_zero_v2.patchDownload
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 3f0c3ee..62c399e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -419,10 +419,9 @@ reread:
}
/*
- * Clear out the tuple table slots for each gather merge input,
- * and return a cleared slot.
+ * Clear out the tuple table slots for each gather merge input.
*/
-static TupleTableSlot *
+static void
gather_merge_clear_slots(GatherMergeState *gm_state)
{
int i;
@@ -437,9 +436,6 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
pfree(gm_state->gm_tuple_buffers);
/* Free the binaryheap, which was created for sort */
binaryheap_free(gm_state->gm_heap);
-
- /* return any clear slot */
- return gm_state->gm_slots[0];
}
/*
@@ -479,7 +475,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
if (binaryheap_empty(gm_state->gm_heap))
{
/* All the queues are exhausted, and so is the heap */
- return gather_merge_clear_slots(gm_state);
+ gather_merge_clear_slots(gm_state);
+ return NULL;
}
else
{
Hi,
On 03/28/2017 11:07 AM, Rushabh Lathia wrote:
...
I think we all agree that we should get rid of nreaders from the
GatherMergeState and need to do some code re-factor. But if I
understood correctly that Robert's concern was to do that re-factor
as separate commit.
Maybe. It depends on how valuable it's to keep Gather and GatherMerge
similar - having nreaders in one and not the other seems a bit weird.
But maybe the refactoring would remove it from both nodes?
Also, it does not really solve the issue that we're using 'nreaders' or
'nworkers_launched' to access array with one extra element.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 29, 2017 at 8:38 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
Hi,
On 03/28/2017 11:07 AM, Rushabh Lathia wrote:
...
I think we all agree that we should get rid of nreaders from the
GatherMergeState and need to do some code re-factor. But if I
understood correctly that Robert's concern was to do that re-factor
as separate commit.Maybe. It depends on how valuable it's to keep Gather and GatherMerge
similar - having nreaders in one and not the other seems a bit weird. But
maybe the refactoring would remove it from both nodes?Also, it does not really solve the issue that we're using 'nreaders' or
'nworkers_launched' to access array with one extra element.
With the latest patches, into gather_merge_init() there is one loop where we
initialize the tuple array (which is to hold the tuple came from the each
worker)
and slot (which is to hold the current tuple slot).
We hold the tuple array for the worker because when we read tuples from
workers,
it's a good idea to read several at once for efficiency when possible: this
minimizes
context-switching overhead.
For leader we don't initialize the slot because for leader we directly call
then ExecProcNode(),
which returns the slot. Also there is no point in holding the tuple array
for the leader.
With the latest patch's, gather_merge_clear_slots() will only clean the
tuple table
slot and the tuple array buffer for the workers (nworkers_launched).
Only time we loop through the "nworkers_launched + 1" is while reading the
tuple
from the each gather merge input. Personally I don't see any problem with
that,
but I am very much open for suggestions.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Rushabh Lathia
On Tue, Mar 28, 2017 at 11:08 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Maybe. It depends on how valuable it's to keep Gather and GatherMerge
similar - having nreaders in one and not the other seems a bit weird. But
maybe the refactoring would remove it from both nodes?
Yeah, it appears to be the case that both Gather and GatherMerge
inevitably end up with nworkers_launched == nreaders, which seems
dumb. If we're going to clean this up, I think we should make them
both match.
Also, it does not really solve the issue that we're using 'nreaders' or
'nworkers_launched' to access array with one extra element.
I'm not clear that there's any fundamental solution to that problem
other than trying to clarify it through comments.
Meantime, I think it's not good to leave this crashing, so I pushed
Rushabh's v2 patch for the actual crash for now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers