crashes due to setting max_parallel_workers=0

Started by Tomas Vondraabout 9 years ago30 messageshackers
Jump to latest
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

while working on a patch I ran into some crashes that seem to be caused
by inconsistent handling of max_parallel_workers - queries still seem to
be planned with parallel plans enabled, but then crash at execution.

The attached script reproduces the issue on a simple query, causing
crashed in GatherMerge, but I assume the issue is more general.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

crash-parallel.sqlapplication/sql; name=crash-parallel.sqlDownload
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Tomas Vondra (#1)
Re: crashes due to setting max_parallel_workers=0

On Sat, Mar 25, 2017 at 7:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

while working on a patch I ran into some crashes that seem to be caused by
inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.

The attached script reproduces the issue on a simple query, causing crashed
in GatherMerge, but I assume the issue is more general.

I could see other parallel plans like Gather also picked at
max_parallel_workers=0. I suspect multiple issues here and this needs
some investigation. I have added this to open items list.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#1)
Re: crashes due to setting max_parallel_workers=0

On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

while working on a patch I ran into some crashes that seem to be caused by
inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.

The attached script reproduces the issue on a simple query, causing crashed
in GatherMerge, but I assume the issue is more general.

I had a look at this and found a bunch of off by 1 errors in the code.

I've attached a couple of patches, one is the minimal fix, and one
more complete one.

In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

gather_merge_minimal_fix.patchapplication/octet-stream; name=gather_merge_minimal_fix.patchDownload+3-3
gather_merge_fix.patchapplication/octet-stream; name=gather_merge_fix.patchDownload+37-34
#4Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: David Rowley (#3)
Re: crashes due to setting max_parallel_workers=0

Thanks Tomas for reporting issue and Thanks David for working
on this.

I can see the problem in GatherMerge, specially when nworkers_launched
is zero. I will look into this issue and will post a fix for the same.

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set
the max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

Thanks,

On Sat, Mar 25, 2017 at 2:21 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

while working on a patch I ran into some crashes that seem to be caused

by

inconsistent handling of max_parallel_workers - queries still seem to be
planned with parallel plans enabled, but then crash at execution.

The attached script reproduces the issue on a simple query, causing

crashed

in GatherMerge, but I assume the issue is more general.

I had a look at this and found a bunch of off by 1 errors in the code.

I've attached a couple of patches, one is the minimal fix, and one
more complete one.

In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Rushabh Lathia

#5David Rowley
dgrowleyml@gmail.com
In reply to: Rushabh Lathia (#4)
Re: crashes due to setting max_parallel_workers=0

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit....

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: David Rowley (#5)
Re: crashes due to setting max_parallel_workers=0

On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

I think force_parallel_mode=regress should test the same thing.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7David Rowley
dgrowleyml@gmail.com
In reply to: Amit Kapila (#6)
Re: crashes due to setting max_parallel_workers=0

On 26 March 2017 at 00:17, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

I think force_parallel_mode=regress should test the same thing.

Not really. That flag's days are surely numbered. It creates a Gather
node, the problem was with GatherMerge.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#5)
Re: crashes due to setting max_parallel_workers=0

On 3/25/17 09:01, David Rowley wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Peter Eisentraut (#8)
Re: crashes due to setting max_parallel_workers=0

On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 3/25/17 09:01, David Rowley wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com>

wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

Thanks,
Rushabh Lathia
www.Enterprisedb.com

Attachments:

gm_nworkers_launched_zero.patchtext/x-patch; charset=US-ASCII; name=gm_nworkers_launched_zero.patchDownload+9-2
#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Rushabh Lathia (#9)
Re: crashes due to setting max_parallel_workers=0

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:

On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

On 3/25/17 09:01, David Rowley wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on
some other query.

But it seems a bit futile to produce the parallel plan in the first
place, because with max_parallel_workers=0 we can't possibly get any
parallel workers ever. I wonder why compute_parallel_worker() only looks
at max_parallel_workers_per_gather, i.e. why shouldn't it do:

parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Regarding handling this at the GUC level - I agree with Peter that
that's not a good idea. I suppose we could deal with checking the values
in the GUC check/assign hooks, but what we don't have is a way to undo
the changes in all the GUCs. That is, if I do

SET max_parallel_workers = 0;
SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and
nothing else.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#5)
Re: crashes due to setting max_parallel_workers=0

On 03/25/2017 02:01 PM, David Rowley wrote:

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

It's demonstrably a valid way to disable parallel queries (i.e. there's
nothing wrong with it), because the docs say this:

Setting this value to 0 disables parallel query execution.

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit....

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...

I agree it'd be very useful to have a more where we generate parallel
plans but then prohibit starting any workers. That would detect this and
similar issues, I think.

I'm not sure we need to invent a new magic value, though. Can we simply
look at force_parallel_mode, and if it's 'regress' then tread 0 differently?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#11)
Re: crashes due to setting max_parallel_workers=0

On 27 March 2017 at 10:23, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I'm not sure we need to invent a new magic value, though. Can we simply look
at force_parallel_mode, and if it's 'regress' then tread 0 differently?

see standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe)
{
Gather *gather = makeNode(Gather);

Probably force_parallel_mode is good for testing the tuple queue code,
and some code in Gather, but I'm not quite sure what else its good
for. Certainly not GatherMerge.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Tomas Vondra (#10)
Re: crashes due to setting max_parallel_workers=0

On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:

On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

On 3/25/17 09:01, David Rowley wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com

<mailto:rushabh.lathia@gmail.com>> wrote:

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path

with

max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it

does.

If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust
against
cases of related settings being nonsensical.

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on some
other query.

Thanks.

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:

parallel_workers = Min(parallel_workers, max_parallel_workers);

I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:

/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));

Perhaps this was discussed and is actually intentional, though.

Yes, I am not quite sure about this.

Regarding handling this at the GUC level - I agree with Peter that that's

not a good idea. I suppose we could deal with checking the values in the
GUC check/assign hooks, but what we don't have is a way to undo the changes
in all the GUCs. That is, if I do

SET max_parallel_workers = 0;
SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and nothing
else.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Rushabh Lathia

#14Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Rushabh Lathia (#13)
Re: crashes due to setting max_parallel_workers=0

On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <
tomas.vondra@2ndquadrant.com> wrote:

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:

On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

On 3/25/17 09:01, David Rowley wrote:

On 25 March 2017 at 23:09, Rushabh Lathia <

rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:

Also another point which I think we should fix is, when someone

set

max_parallel_workers = 0, we should also set the
max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path

with

max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it

does.

If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust
against
cases of related settings being nonsensical.

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on some
other query.

Thanks.

I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.

But it seems a bit futile to produce the parallel plan in the first place,

because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:

parallel_workers = Min(parallel_workers, max_parallel_workers);

I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:

/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));

Perhaps this was discussed and is actually intentional, though.

Yes, I am not quite sure about this.

Regarding handling this at the GUC level - I agree with Peter that that's

not a good idea. I suppose we could deal with checking the values in the
GUC check/assign hooks, but what we don't have is a way to undo the changes
in all the GUCs. That is, if I do

SET max_parallel_workers = 0;
SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and nothing
else.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Rushabh Lathia

Regards,
Rushabh Lathia
www.EnterpriseDB.com

Attachments:

gm_nworkers_launched_zero_v2.patchtext/x-patch; charset=US-ASCII; name=gm_nworkers_launched_zero_v2.patchDownload+4-7
#15Robert Haas
robertmhaas@gmail.com
In reply to: Rushabh Lathia (#13)
Re: crashes due to setting max_parallel_workers=0

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:

parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Yes, I am not quite sure about this.

It was intentional. See the last paragraph of

/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: crashes due to setting max_parallel_workers=0

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.

It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.com

Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: crashes due to setting max_parallel_workers=0

On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.

It was intentional. See the last paragraph of
/messages/by-id/CA+TgmoaMSn6a1780VutfsarCu0LCr=CO2yi4vLUo-JQbn4YuRA@mail.gmail.com

Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.

Hey, imagine if the comments explained the logic behind the code!

Good idea. How about the attached?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

add-worker-comment.patchapplication/octet-stream; name=add-worker-comment.patchDownload+6-0
#18Robert Haas
robertmhaas@gmail.com
In reply to: Rushabh Lathia (#9)
Re: crashes due to setting max_parallel_workers=0

On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#18)
Re: crashes due to setting max_parallel_workers=0

On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#19)
Re: crashes due to setting max_parallel_workers=0

On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be. Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea. Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#23Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Robert Haas (#20)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Rushabh Lathia (#23)
#25Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#17)
#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Rushabh Lathia (#14)
#27Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Tomas Vondra (#26)
#28Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Rushabh Lathia (#27)
#29Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Tomas Vondra (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#28)