Re: Parallel Append implementation

Started by Amit Khandekarover 8 years ago4 messages
#1Amit Khandekar
amitdkhan.pg@gmail.com

On 11 September 2017 at 18:55, Amit Kapila <amit.kapila16@gmail.com> wrote:

Do you think non-parallel-aware Append
will be better in any case when there is a parallel-aware append? I
mean to say let's try to create non-parallel-aware append only when
parallel-aware append is not possible.

By non-parallel-aware append, I am assuming you meant partial
non-parallel-aware Append. Yes, if the parallel-aware Append path has
*all* partial subpaths chosen, then we do omit a partial non-parallel
Append path, as seen in this code in the patch :

/*
* Consider non-parallel partial append path. But if the parallel append
* path is made out of all partial subpaths, don't create another partial
* path; we will keep only the parallel append path in that case.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
{
......
}

But if the parallel-Append path has a mix of partial and non-partial
subpaths, then we can't really tell which of the two could be cheapest
until we calculate the cost. It can be that the non-parallel-aware
partial Append can be cheaper as well.

How? See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy. Can you
please try to give me some example so that it will be clear.

Suppose 4 appendrel children have costs for their cheapest partial (p)
and non-partial paths (np) as shown below :

p1=5000 np1=100
p2=200 np2=1000
p3=80 np3=2000
p4=3000 np4=50

Here, following two Append paths will be generated :

1. a parallel-aware Append path with subpaths :
np1, p2, p3, np4

2. Partial (i.e. non-parallel-aware) Append path with all partial subpaths:
p1,p2,p3,p4

Now, one thing we can do above is : Make the path#2 parallel-aware as
well; so both Append paths would be parallel-aware. Are you suggesting
exactly this ?

So above, what I am saying is, we can't tell which of the paths #1 and
#2 are cheaper until we calculate total cost. I didn't understand what
did you mean by "non-parallel-aware append will consider only the
partial subpaths and that too with sub-optimal strategy" in the above
example. I guess, you were considering a different scenario than the
above one.

Whereas, if one or more subpaths of Append do not have partial subpath
in the first place, then non-parallel-aware partial Append is out of
question, which we both agree.
And the other case where we skip non-parallel-aware partial Append is
when all the cheapest subpaths of the parallel-aware Append path are
partial paths: we do not want parallel-aware and non-parallel-aware
Append paths both having exactly the same partial subpaths.

---------

I will be addressing your other comments separately.

Thanks
-Amit Khandekar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#1)

On Thu, Sep 14, 2017 at 8:30 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 11 September 2017 at 18:55, Amit Kapila <amit.kapila16@gmail.com> wrote:

How? See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy. Can you
please try to give me some example so that it will be clear.

Suppose 4 appendrel children have costs for their cheapest partial (p)
and non-partial paths (np) as shown below :

p1=5000 np1=100
p2=200 np2=1000
p3=80 np3=2000
p4=3000 np4=50

Here, following two Append paths will be generated :

1. a parallel-aware Append path with subpaths :
np1, p2, p3, np4

2. Partial (i.e. non-parallel-aware) Append path with all partial subpaths:
p1,p2,p3,p4

Now, one thing we can do above is : Make the path#2 parallel-aware as
well; so both Append paths would be parallel-aware.

Yes, we can do that and that is what I think is probably better. So,
the question remains that in which case non-parallel-aware partial
append will be required? Basically, it is not clear to me why after
having parallel-aware partial append we need non-parallel-aware
version? Are you keeping it for the sake of backward-compatibility or
something like for cases if someone has disabled parallel append with
the help of new guc in this patch?

So above, what I am saying is, we can't tell which of the paths #1 and
#2 are cheaper until we calculate total cost. I didn't understand what
did you mean by "non-parallel-aware append will consider only the
partial subpaths and that too with sub-optimal strategy" in the above
example. I guess, you were considering a different scenario than the
above one.

Yes, something different, but I think you can ignore that as we can
discuss the guts of my point based on the example given by you above.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#2)

On 16 September 2017 at 11:45, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 14, 2017 at 8:30 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 11 September 2017 at 18:55, Amit Kapila <amit.kapila16@gmail.com> wrote:

How? See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy. Can you
please try to give me some example so that it will be clear.

Suppose 4 appendrel children have costs for their cheapest partial (p)
and non-partial paths (np) as shown below :

p1=5000 np1=100
p2=200 np2=1000
p3=80 np3=2000
p4=3000 np4=50

Here, following two Append paths will be generated :

1. a parallel-aware Append path with subpaths :
np1, p2, p3, np4

2. Partial (i.e. non-parallel-aware) Append path with all partial subpaths:
p1,p2,p3,p4

Now, one thing we can do above is : Make the path#2 parallel-aware as
well; so both Append paths would be parallel-aware.

Yes, we can do that and that is what I think is probably better. So,
the question remains that in which case non-parallel-aware partial
append will be required? Basically, it is not clear to me why after
having parallel-aware partial append we need non-parallel-aware
version? Are you keeping it for the sake of backward-compatibility or
something like for cases if someone has disabled parallel append with
the help of new guc in this patch?

Yes one case is the enable_parallelappend GUC case. If a user disables
it, we do want to add the usual non-parallel-aware append partial
path.

About backward compatibility, the concern we discussed in [1]/messages/by-id/CA+TgmoaLRtaWdJVHfhHej2s7w1spbr6gZiZXJrM5bsz1KQ54Rw@mail.gmail.com was that
we better continue to have the usual non-parallel-aware partial Append
path, plus we should have an additional parallel-aware Append path
containing mix of partial and non-partial subpaths.

But thinking again on the example above, I think Amit, I tend to agree
that we don't have to worry about the existing behaviour, and so we
can make the path#2 parallel-aware as well.

Robert, can you please suggest what is your opinion on the paths that
are chosen in the above example ?

[1]: /messages/by-id/CA+TgmoaLRtaWdJVHfhHej2s7w1spbr6gZiZXJrM5bsz1KQ54Rw@mail.gmail.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#2)

On Sat, Sep 16, 2017 at 2:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yes, we can do that and that is what I think is probably better. So,
the question remains that in which case non-parallel-aware partial
append will be required? Basically, it is not clear to me why after
having parallel-aware partial append we need non-parallel-aware
version? Are you keeping it for the sake of backward-compatibility or
something like for cases if someone has disabled parallel append with
the help of new guc in this patch?

We can't use parallel append if there are pathkeys, because parallel
append will disturb the output ordering. Right now, that never
happens anyway, but that will change when
https://commitfest.postgresql.org/14/1093/ is committed.

Parallel append is also not safe for a parameterized path, and
set_append_rel_pathlist() already creates those. I guess it could be
safe if the parameter is passed down from above the Gather, if we
allowed that, but it's sure not safe in a case like this:

Gather
-> Nested Loop
-> Parallel Seq Scan
-> Append
-> whatever

If it's not clear why that's a disaster, please ask for a more
detailed explaination...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers