a raft of parallelism-related bug fixes
My recent commit of the Gather executor node has made it relatively
simple to write code that does an end-to-end test of all of the
parallelism-relate commits which have thus far gone into the tree.
Specifically, what I've done is hacked the planner to push a
single-copy Gather node on top of every plan that is thought to be
parallel-safe, and then run 'make check'. This turned up bugs in
nearly every parallelism-related commit that has thus far gone into
the tree, which is a little depressing, especially because some of
them are what we've taken to calling brown paper bag bugs. The good
news is that, with one or two exceptions, these are pretty much just
trivial oversights which are simple to fix, rather than any sort of
deeper design issue. Attached are 14 patches. Patches #1-#4 are
essential for testing purposes but are not proposed for commit,
although some of the code they contain may eventually become part of
other patches which are proposed for commit. Patches #5-#12 are
largely boring patches fixing fairly uninteresting mistakes; I propose
to commit these on an expedited basis. Patches #13-14 are also
proposed for commit but seem to me to be more in need of review. With
all of these patches, I can now get a clean 'make check' run, although
I think there are a few bugs remaining to be fixed because some of my
colleagues still experience misbehavior even with all of these patches
applied. The patch stack is also posted here; the branch is subject
to rebasing:
http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/gathertest
Here follows an overview of each individual patch (see also commit
messages within).
== For Testing Only ==
0001-Test-code.patch is the basic test code. In addition to pushing a
Gather node on top of apparently-safe parallel plans, it also ignores
that Gather node when generating EXPLAIN output and suppressing
parallel context in error messages, changes which are essential to
getting the regression tests to pass. I'm wondering if the parallel
context error ought to be GUC-controlled, defaulting to on but capable
of being enabled on request.
0002-contain_parallel_unsafe-check_parallel_safety.patch and
0003-Temporary-hack-to-reduce-testing-failures.patch arrange NOT to
put Gather nodes on top of plans that contain parallel-restricted
operations or refer to temporary tables. Although such things can
exist in a parallel plan, they must be above every Gather node, not
beneath it. Here, the Gather node is being placed (strictly for
testing purposes) at the very top, so we must not insert it at all if
these things are present.
0004-Partial-group-locking-implementation.patch is a partial
implementation of group locking. I found that without this, the
regression tests hang frequently, and a clean run is impossible. This
patch doesn't modify the deadlock detector, and it doesn't take any
account of locks that should be mutually exclusive even as between
members of a parallel group, but it's enough for a clean regression
test run. We will need a full solution to this problem soon enough,
but right now I am only using this to find such unrelated bugs as we
may have.
== Proposed For Commit ==
0005-Don-t-send-protocol-messages-to-a-shm_mq-that-no-lon.patch fixes
a problem in the parallel worker shutdown sequence: a background
worker can choose to redirect messages that would normally be sent to
a client to a shm_mq, and parallel workers always do this. But if the
worker generates a message after the DSM has been detached, it causes
a server crash.
0006-Transfer-current-command-counter-ID-to-parallel-work.patch fixes
a problem in the code used to set up a parallel worker's transaction
state. The command counter is presently not copied to the worker.
This is awfully embarrassing and should have been caught in the
testing of the parallel mode/contexts patch, but I got overly focused
on the stuff stored inside TransactionStateData. Don't shoot.
0007-Tighten-up-application-of-parallel-mode-checks.patch fixes
another problem with the parallel mode checks, which are intended to
catch people doing unsafe things and throw errors instead of letting
them crash the server. Investigation reveals that they don't have
this effect because parallel workers were running their pre-commit
sequence with the checks disabled. If they do something like try to
send notifications, it can lead to the worker getting an XID
assignment even though the master doesn't have one. That's really
bad, and crashes the server. That specific example should be
prohibited anyway (see patch #11) but even if we fix that I think this
is good tightening to prevent unpleasant surprises in the future.
0008-Invalidate-caches-after-cranking-up-a-parallel-worke.patch
invalidates invalidates system cache entries after cranking up a
parallel worker transaction. This is needed here for the same reason
that the logical decoding code needs to do it after time traveling:
otherwise, the worker might have leftover entries in its caches as a
result of the startup transaction that are now bogus given the changes
in what it can see.
0009-Fix-a-problem-with-parallel-workers-being-unable-to-.patch fixes
a problem with workers being unable to precisely recreate the
authorization state as it existed in the parallel leader. They need to
do that, or else it's a security vulnerability.
0010-Prohibit-parallel-query-when-the-isolation-level-is-.patch
prohibits parallel query at the serializable isolation level. This is
of course a restriction we'd rather not have, but it's a necessary one
for now because the predicate locking code doesn't understand the idea
of multiple processes with separate PGPROC structures being part of a
single transaction.
0011-Mark-more-functions-parallel-restricted-or-parallel-.patch marks
some functions as parallel-restricted or parallel-unsafe that in fact
are, but were not so marked by the commit that introduced the new
pg_proc flag. This includes functions for sending notifications and a
few others.
0012-Rewrite-interaction-of-parallel-mode-with-parallel-e.patch
rejiggers the timing of enabling and disabling parallel mode when we
are attempting parallel execution. The old coding turned out to be
fragile in multiple ways. Since it's impractical to know at planning
time with ExecutorRun will be called with a non-zero tuple count, this
patch instead observes whether or not this happens, and if it does
happen, the parallel plan is forced to run serially. In the old
coding, it instead just killed the parallel workers at the end of
ExecutorRun and therefore returned an incomplete result set. There
might be some further rejiggering that could be done here that would
be even better than this, but I'm fairly certain this is better than
what we've got in the tree right now.
0013-Modify-tqueue-infrastructure-to-support-transient-re.patch
attempts to address a deficiency in the tqueue.c/tqueue.h machinery I
recently introduced: backends can have ephemeral record types for
which they use backend-local typmods that may not be the same between
the leader and the worker. This patch has the worker send metadata
about the tuple descriptor for each such type, and the leader
registers the same tuple descriptor and then remaps the typmods from
the worker's typmod space to its own. This seems to work, but I'm a
little concerned that there may be cases it doesn't cover. Also,
there's room to question the overall approach. The only other
alternative that springs readily to mind is to try to arrange things
during the planning phase so that we never try to pass records between
parallel backends in this way, but that seems like it would be hard to
code (and thus likely to have bugs) and also pretty limiting.
0014-Fix-problems-with-ParamListInfo-serialization-mechan.patch, which
I just posted on the Parallel Seq Scan thread as a standalone patch,
fixes pretty much what the name of the file suggests. This actually
fixes two problems, one of which Noah spotted and commented on over on
that thread. By pure coincidence, the last 'make check' regression
failure I was still troubleshooting needed a fix for that issue plus a
fix to plpgsql_param_fetch. However, as I mentioned on the other
thread, I'm not quite sure which way to go with the change to
plpgsql_param_fetch so scrutiny of that point, in particular, would be
appreciated. See also
/messages/by-id/CA+TgmobN=wADVaUTwsH-xqvCdovkeRasuXw2k3R6vmpWig7raw@mail.gmail.com
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0004-Partial-group-locking-implementation.patchapplication/x-patch; name=0004-Partial-group-locking-implementation.patchDownload+267-25
0005-Don-t-send-protocol-messages-to-a-shm_mq-that-no-lon.patchapplication/x-patch; name=0005-Don-t-send-protocol-messages-to-a-shm_mq-that-no-lon.patchDownload+28-5
0006-Transfer-current-command-counter-ID-to-parallel-work.patchapplication/x-patch; name=0006-Transfer-current-command-counter-ID-to-parallel-work.patchDownload+25-22
0007-Tighten-up-application-of-parallel-mode-checks.patchapplication/x-patch; name=0007-Tighten-up-application-of-parallel-mode-checks.patchDownload+7-6
0008-Invalidate-caches-after-cranking-up-a-parallel-worke.patchapplication/x-patch; name=0008-Invalidate-caches-after-cranking-up-a-parallel-worke.patchDownload+7-1
0009-Fix-a-problem-with-parallel-workers-being-unable-to-.patchapplication/x-patch; name=0009-Fix-a-problem-with-parallel-workers-being-unable-to-.patchDownload+14-3
0010-Prohibit-parallel-query-when-the-isolation-level-is-.patchapplication/x-patch; name=0010-Prohibit-parallel-query-when-the-isolation-level-is-.patchDownload+18-1
0001-Test-code.patchapplication/x-patch; name=0001-Test-code.patchDownload+100-2
0002-contain_parallel_unsafe-check_parallel_safety.patchapplication/x-patch; name=0002-contain_parallel_unsafe-check_parallel_safety.patchDownload+85-18
0003-Temporary-hack-to-reduce-testing-failures.patchapplication/x-patch; name=0003-Temporary-hack-to-reduce-testing-failures.patchDownload+1-2
0011-Mark-more-functions-parallel-restricted-or-parallel-.patchapplication/x-patch; name=0011-Mark-more-functions-parallel-restricted-or-parallel-.patchDownload+23-21
0012-Rewrite-interaction-of-parallel-mode-with-parallel-e.patchapplication/x-patch; name=0012-Rewrite-interaction-of-parallel-mode-with-parallel-e.patchDownload+95-71
0013-Modify-tqueue-infrastructure-to-support-transient-re.patchapplication/x-patch; name=0013-Modify-tqueue-infrastructure-to-support-transient-re.patchDownload+467-31
0014-Fix-problems-with-ParamListInfo-serialization-mechan.patchapplication/x-patch; name=0014-Fix-problems-with-ParamListInfo-serialization-mechan.patchDownload+27-16
On Mon, Oct 12, 2015 at 1:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Attached are 14 patches. Patches #1-#4 are
essential for testing purposes but are not proposed for commit,
although some of the code they contain may eventually become part of
other patches which are proposed for commit. Patches #5-#12 are
largely boring patches fixing fairly uninteresting mistakes; I propose
to commit these on an expedited basis. Patches #13-14 are also
proposed for commit but seem to me to be more in need of review.
Hearing no objections, I've now gone and committed #5-#12.
0013-Modify-tqueue-infrastructure-to-support-transient-re.patch
attempts to address a deficiency in the tqueue.c/tqueue.h machinery I
recently introduced: backends can have ephemeral record types for
which they use backend-local typmods that may not be the same between
the leader and the worker. This patch has the worker send metadata
about the tuple descriptor for each such type, and the leader
registers the same tuple descriptor and then remaps the typmods from
the worker's typmod space to its own. This seems to work, but I'm a
little concerned that there may be cases it doesn't cover. Also,
there's room to question the overall approach. The only other
alternative that springs readily to mind is to try to arrange things
during the planning phase so that we never try to pass records between
parallel backends in this way, but that seems like it would be hard to
code (and thus likely to have bugs) and also pretty limiting.
I am still hoping someone will step up to review this.
0014-Fix-problems-with-ParamListInfo-serialization-mechan.patch, which
I just posted on the Parallel Seq Scan thread as a standalone patch,
fixes pretty much what the name of the file suggests. This actually
fixes two problems, one of which Noah spotted and commented on over on
that thread. By pure coincidence, the last 'make check' regression
failure I was still troubleshooting needed a fix for that issue plus a
fix to plpgsql_param_fetch. However, as I mentioned on the other
thread, I'm not quite sure which way to go with the change to
plpgsql_param_fetch so scrutiny of that point, in particular, would be
appreciated. See also
/messages/by-id/CA+TgmobN=wADVaUTwsH-xqvCdovkeRasuXw2k3R6vmpWig7raw@mail.gmail.com
Noah's been helping with this issue on the other thread. I'll revise
this patch along the lines discussed there and resubmit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 October 2015 at 18:04, Robert Haas <robertmhaas@gmail.com> wrote:
My recent commit of the Gather executor node has made it relatively
simple to write code that does an end-to-end test of all of the
parallelism-relate commits which have thus far gone into the tree.
I've been wanting to help here for a while, but time remains limited for
next month or so.
From reading this my understanding is that there isn't a test suite
included with this commit?
I've tried to review the Gather node commit and I note that the commit
message contains a longer description of the functionality in that patch
than any comments in the patch as a whole. No design comments, no README,
no file header comments. For such a major feature that isn't acceptable - I
would reject a patch from others on that basis alone (and have done so). We
must keep the level of comments high if we are to encourage wider
participation in the project.
So reviewing patch 13 isn't possible without prior knowledge.
Hoping we'll be able to find some time on this at PGConf.eu; thanks for
coming over.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 17, 2015 at 9:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
From reading this my understanding is that there isn't a test suite included
with this commit?
Right. The patches on the thread contain code that can be used for
testing, but the committed code does not itself include test coverage.
I welcome thoughts on how we could perform automated testing of this
code. I think at least part of the answer is that I need to press on
toward getting the rest of Amit's parallel sequential scan patch
committed, because then this becomes a user-visible feature and I
expect that to make it much easier to find whatever bugs remain. A
big part of the difficulty in testing this up until now is that I've
been building towards, hey, we have parallel query. Until we actually
do, you need to write C code to test this, which raises the bar
considerably.
Now, that does not mean we shouldn't test this in other ways, and of
course I have, and so have Amit and other people from the community -
of late, Noah Misch and Haribabu Kommi have found several bugs through
code inspection and testing, which included some of the same ones that
I was busy finding and fixing using the test code attached to this
thread. That's one of the reasons why I wanted to press forward with
getting the fixes for those bugs committed. It's just a waste of
everybody's time if we re-finding known bugs for which fixes already
exist.
But the question of how to test this in the buildfarm is a good one,
and I don't have a complete answer. Once the rest of this goes in,
which I hope will be soon, we can EXPLAIN or EXPLAIN ANALYZE or just
straight up run parallel queries in the regression test suite and see
that they behave as expected. But I don't expect that to provide
terribly good test coverage. One idea that I think would provide
*excellent* test coverage is to take the test code included on this
thread and run it on the buildfarm. The idea of the code is to
basically run the regression test suite with every parallel-eligible
query forced to unnecessarily use parallelism. Turning that and
running 'make check' found, directly or indirectly, all of these bugs.
Doing that on the whole buildfarm would probably find more.
However, I'm pretty sure that we don't want to switch the *entire*
buildfarm to using lots of unnecessary parallelism. What we might be
able to do is have some critters that people spin up for this precise
purpose. Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
members, we could have GRATUITOUSLY_PARALLEL buildfarm members. If
Andrew is willing to add buildfarm support for that option and a few
people are willing to run critters in that mode, I will be happy -
more than happy, really - to put the test code into committable form,
guarded by a #define, and away we go.
Of course, other ideas for testing are also welcome.
I've tried to review the Gather node commit and I note that the commit
message contains a longer description of the functionality in that patch
than any comments in the patch as a whole. No design comments, no README, no
file header comments. For such a major feature that isn't acceptable - I
would reject a patch from others on that basis alone (and have done so). We
must keep the level of comments high if we are to encourage wider
participation in the project.
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation. Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular. Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.
It's worth noting, though, that the executor files in general don't
contain great gobs of comments, and the executor README even has this
vintage 2001 comment:
XXX a great deal more documentation needs to be written here...
Well, yeah. It's taken me a long time to understand how the executor
actually works, and there are parts of it - particularly related to
EvalPlanQual - that I still don't fully understand. So some of the
lack of comments in, for example, nodeGather.c is because it copies
the style of other executor nodes, like nodeSeqscan.c. It's not
exactly clear to me what more to document there. You either
understand what a rescan node is, in which case the code for each
node's rescan method tends to be fairly self-evident, or you don't -
but that clearly shouldn't be re-explained in every file. So I guess
what I'm saying is I could use some advice on what kinds things would
be most useful to document, and where to put that documentation.
Right now, the best explanation of how parallelism works is in
src/backend/access/transam/README.parallel -- but, as you rightly
point out, that doesn't cover the executor bits. Should we have SGML
documentation under "VII. Internals" that explains what's under the
hood in the same way that we have sections for "Database Physical
Storage" and "PostgreSQL Coding Conventions"? Should the stuff in the
existing README.parallel be moved there? Or I could just add some
words to src/backend/executor/README to cover the parallel executor
stuff, if that is preferred. Advice?
Also, regardless of how we document what's going on at the code level,
I think we probably should have a section *somewhere* in the main SGML
documentation that kind of explains the general concepts behind
parallel query from a user/DBA perspective. But I don't know where to
put it. Under "Server Administration"? Exactly what to explain there
needs some thought, too. I'm sort of wondering if we need two
chapters in the documentation on this, one that covers it from a
user/DBA perspective and the other of which covers it from a hacker
perspective. But then maybe the hacker stuff should just go in README
files. I'm not sure. I may have to try writing some of this and see
how it goes, but advice is definitely appreciated.
I am happy to definitively commit to writing whatever documentation
the community feels is necessary here, and I will do that certainly
before end of development for 9.6 and hopefully much sooner than that.
I will do that even if I don't get any specific feedback on what to
write and where to put it, but the more feedback I get, the better the
result will probably be. Some of the reason this hasn't been done
already is because we're still getting the infrastructure into place,
and we're fixing and adjusting things as we go along, so while the
overall picture isn't changing much, there are bits of the design that
are still in flux as we realize, oh, crap, that was a dumb idea. As
we get a clearer idea what will be in 9.6, it will get easier to
present the overall picture in a coherent way.
So reviewing patch 13 isn't possible without prior knowledge.
The basic question for patch 13 is whether ephemeral record types can
occur in executor tuples in any contexts that I haven't identified. I
know that a tuple table slot can contain have a column that is of type
record or record[], and those records can themselves contain
attributes of type record or record[], and so on as far down as you
like. I *think* that's the only case. For example, I don't believe
that a TupleTableSlot can contain a *named* record type that has an
anonymous record buried down inside of it somehow. But I'm not
positive I'm right about that.
Hoping we'll be able to find some time on this at PGConf.eu; thanks for
coming over.
Sure thing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 17, 2015 at 06:17:37PM -0400, Robert Haas wrote:
One idea that I think would provide
*excellent* test coverage is to take the test code included on this
thread and run it on the buildfarm. The idea of the code is to
basically run the regression test suite with every parallel-eligible
query forced to unnecessarily use parallelism. Turning that and
running 'make check' found, directly or indirectly, all of these bugs.
Doing that on the whole buildfarm would probably find more.However, I'm pretty sure that we don't want to switch the *entire*
buildfarm to using lots of unnecessary parallelism. What we might be
able to do is have some critters that people spin up for this precise
purpose. Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
members, we could have GRATUITOUSLY_PARALLEL buildfarm members. If
Andrew is willing to add buildfarm support for that option and a few
What, if anything, would this mode require beyond adding a #define? If
nothing, it won't require specific support in the buildfarm script.
CLOBBER_CACHE_ALWAYS has no specific support.
people are willing to run critters in that mode, I will be happy -
more than happy, really - to put the test code into committable form,
guarded by a #define, and away we go.
I would make one such animal.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Noah Misch (noah@leadboat.com) wrote:
On Sat, Oct 17, 2015 at 06:17:37PM -0400, Robert Haas wrote:
people are willing to run critters in that mode, I will be happy -
more than happy, really - to put the test code into committable form,
guarded by a #define, and away we go.I would make one such animal.
We're also looking at what animals it makes sense to run as part of
pginfra and I expect we'd be able to include an animal for these tests
also (though Stefan is the one really driving that effort).
Thanks!
Stephen
On 10/17/2015 06:17 PM, Robert Haas wrote:
However, I'm pretty sure that we don't want to switch the *entire*
buildfarm to using lots of unnecessary parallelism. What we might be
able to do is have some critters that people spin up for this precise
purpose. Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
members, we could have GRATUITOUSLY_PARALLEL buildfarm members. If
Andrew is willing to add buildfarm support for that option and a few
people are willing to run critters in that mode, I will be happy -
more than happy, really - to put the test code into committable form,
guarded by a #define, and away we go.
If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
special buildfarm support is required - you would just add that to the
animal's config file, more or less like this:
config_env =>
{
CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
},
I try to make things easy :-)
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
special buildfarm support is required - you would just add that to the
animal's config file, more or less like this:config_env =>
{
CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
},I try to make things easy :-)
Wow, that's great. So, I'll try to rework the test code I posted
previously into something less hacky, and eventually add a #define
like this so we can run it on the buildfarm. There's a few other
things that need to get done before that really makes sense - like
getting the rest of the bug fix patches committed - otherwise any
buildfarm critters we add will just be permanently red.
Thanks to Noah and Stephen for your replies also - it is good to hear
that if I spend the time to make this committable, somebody will use
it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation. Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular. Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.
Here is a patch to add a hopefully-useful file header comment to
execParallel.c. I included one for nodeGather.c as well, which seems
to be contrary to previous practice, but actually it seems like
previous practice is not the greatest: surely it's not self-evident
what all of the executor nodes do.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
parallel-exec-header-comments.patchapplication/x-patch; name=parallel-exec-header-comments.patchDownload+22-0
On 17 October 2015 at 18:17, Robert Haas <robertmhaas@gmail.com> wrote:
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation. Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular. Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.
Not on your case in a big way, just noting the need for change there.
I'll help as well, but if you could start with enough basics to allow me to
ask questions that will help. Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com>
wrote:
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation. Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular. Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.Here is a patch to add a hopefully-useful file header comment to
execParallel.c. I included one for nodeGather.c as well, which seems
to be contrary to previous practice, but actually it seems like
previous practice is not the greatest: surely it's not self-evident
what all of the executor nodes do.
+ * any ParamListInfo associated witih the query, buffer usage info, and
+ * the actual plan to be passed down to the worker.
typo 'witih'.
+ * return the results. Therefore, a plan used with a single-copy Gather
+ * node not be parallel-aware.
"node not" seems to be incomplete.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wednesday, 21 October 2015, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas <robertmhaas@gmail.com
<javascript:_e(%7B%7D,'cvml','robertmhaas@gmail.com');>> wrote:On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com
<javascript:_e(%7B%7D,'cvml','robertmhaas@gmail.com');>> wrote:
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation. Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular. Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.Here is a patch to add a hopefully-useful file header comment to
execParallel.c. I included one for nodeGather.c as well, which seems
to be contrary to previous practice, but actually it seems like
previous practice is not the greatest: surely it's not self-evident
what all of the executor nodes do.+ * any ParamListInfo associated witih the query, buffer usage info, and + * the actual plan to be passed down to the worker.typo 'witih'.
+ * return the results. Therefore, a plan used with a single-copy Gather + * node not be parallel-aware."node not" seems to be incomplete.
... node *need* not be parallel aware?
Thanks,
Amit
On Wed, Oct 21, 2015 at 9:04 AM, Amit Langote <amitlangote09@gmail.com> wrote:
... node *need* not be parallel aware?
Yes, thanks. Committed that way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 20, 2015 at 6:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Not on your case in a big way, just noting the need for change there.
Yes, I appreciate your attitude. I think we are on the same wavelength.
I'll help as well, but if you could start with enough basics to allow me to
ask questions that will help. Thanks.
Will try to keep pushing in that direction. May be easier once some
of the dust has settled.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
So reviewing patch 13 isn't possible without prior knowledge.
The basic question for patch 13 is whether ephemeral record types can
occur in executor tuples in any contexts that I haven't identified. I
know that a tuple table slot can contain have a column that is of type
record or record[], and those records can themselves contain
attributes of type record or record[], and so on as far down as you
like. I *think* that's the only case. For example, I don't believe
that a TupleTableSlot can contain a *named* record type that has an
anonymous record buried down inside of it somehow. But I'm not
positive I'm right about that.
I have done some more testing and investigation and determined that
this optimism was unwarranted. It turns out that the type information
for composite and record types gets stored in two different places.
First, the TupleTableSlot has a type OID, indicating the sort of the
value it expects to be stored for that slot attribute. Second, the
value itself contains a type OID and typmod. And these don't have to
match. For example, consider this query:
select row_to_json(i) from int8_tbl i(x,y);
Without i(x,y), the HeapTuple passed to row_to_json is labelled with
the pg_type OID of int8_tbl. But with the query as written, it's
labeled as an anonymous record type. If I jigger things by hacking
the code so that this is planned as Gather (single-copy) -> SeqScan,
with row_to_json evaluated at the Gather node, then the sequential
scan kicks out a tuple with a transient record type and stores it into
a slot whose type OID is still that of int8_tbl. My previous patch
failed to deal with that; the attached one does.
The previous patch was also defective in a few other respects. The
most significant of those, maybe, is that it somehow thought it was OK
to assume that transient typmods from all workers could be treated
interchangeably rather than individually. To fix this, I've changed
the TupleQueueFunnel implemented by tqueue.c to be merely a
TupleQueueReader which handles reading from a single worker only.
nodeGather.c therefore creates one TupleQueueReader per worker instead
of a single TupleQueueFunnel for all workers; accordingly, the logic
for multiplexing multiple queues now lives in nodeGather.c. This is
probably how I should have done it originally - someone, I think Jeff
Davis - complained previously that tqueue.c had no business embedding
the round-robin policy decision, and he was right. So this addresses
that complaint as well.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
tqueue-record-types-v2.patchapplication/x-patch; name=tqueue-record-types-v2.patchDownload+980-155
On Wed, Oct 28, 2015 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
So reviewing patch 13 isn't possible without prior knowledge.
The basic question for patch 13 is whether ephemeral record types can
occur in executor tuples in any contexts that I haven't identified. I
know that a tuple table slot can contain have a column that is of type
record or record[], and those records can themselves contain
attributes of type record or record[], and so on as far down as you
like. I *think* that's the only case. For example, I don't believe
that a TupleTableSlot can contain a *named* record type that has an
anonymous record buried down inside of it somehow. But I'm not
positive I'm right about that.I have done some more testing and investigation and determined that
this optimism was unwarranted. It turns out that the type information
for composite and record types gets stored in two different places.
First, the TupleTableSlot has a type OID, indicating the sort of the
value it expects to be stored for that slot attribute. Second, the
value itself contains a type OID and typmod. And these don't have to
match. For example, consider this query:select row_to_json(i) from int8_tbl i(x,y);
Without i(x,y), the HeapTuple passed to row_to_json is labelled with
the pg_type OID of int8_tbl. But with the query as written, it's
labeled as an anonymous record type. If I jigger things by hacking
the code so that this is planned as Gather (single-copy) -> SeqScan,
with row_to_json evaluated at the Gather node, then the sequential
scan kicks out a tuple with a transient record type and stores it into
a slot whose type OID is still that of int8_tbl. My previous patch
failed to deal with that; the attached one does.The previous patch was also defective in a few other respects. The
most significant of those, maybe, is that it somehow thought it was OK
to assume that transient typmods from all workers could be treated
interchangeably rather than individually. To fix this, I've changed
the TupleQueueFunnel implemented by tqueue.c to be merely a
TupleQueueReader which handles reading from a single worker only.
nodeGather.c therefore creates one TupleQueueReader per worker instead
of a single TupleQueueFunnel for all workers; accordingly, the logic
for multiplexing multiple queues now lives in nodeGather.c. This is
probably how I should have done it originally - someone, I think Jeff
Davis - complained previously that tqueue.c had no business embedding
the round-robin policy decision, and he was right. So this addresses
that complaint as well.
Here is an updated version. This is rebased over recent commits, and
I added a missing check for attisdropped.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
tqueue-record-types-v3.patchtext/x-diff; charset=US-ASCII; name=tqueue-record-types-v3.patchDownload+986-154
On Mon, Nov 2, 2015 at 9:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 28, 2015 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
So reviewing patch 13 isn't possible without prior knowledge.
The basic question for patch 13 is whether ephemeral record types can
occur in executor tuples in any contexts that I haven't identified. I
know that a tuple table slot can contain have a column that is of type
record or record[], and those records can themselves contain
attributes of type record or record[], and so on as far down as you
like. I *think* that's the only case. For example, I don't believe
that a TupleTableSlot can contain a *named* record type that has an
anonymous record buried down inside of it somehow. But I'm not
positive I'm right about that.I have done some more testing and investigation and determined that
this optimism was unwarranted. It turns out that the type information
for composite and record types gets stored in two different places.
First, the TupleTableSlot has a type OID, indicating the sort of the
value it expects to be stored for that slot attribute. Second, the
value itself contains a type OID and typmod. And these don't have to
match. For example, consider this query:select row_to_json(i) from int8_tbl i(x,y);
Without i(x,y), the HeapTuple passed to row_to_json is labelled with
the pg_type OID of int8_tbl. But with the query as written, it's
labeled as an anonymous record type. If I jigger things by hacking
the code so that this is planned as Gather (single-copy) -> SeqScan,
with row_to_json evaluated at the Gather node, then the sequential
scan kicks out a tuple with a transient record type and stores it into
a slot whose type OID is still that of int8_tbl. My previous patch
failed to deal with that; the attached one does.The previous patch was also defective in a few other respects. The
most significant of those, maybe, is that it somehow thought it was OK
to assume that transient typmods from all workers could be treated
interchangeably rather than individually. To fix this, I've changed
the TupleQueueFunnel implemented by tqueue.c to be merely a
TupleQueueReader which handles reading from a single worker only.
nodeGather.c therefore creates one TupleQueueReader per worker instead
of a single TupleQueueFunnel for all workers; accordingly, the logic
for multiplexing multiple queues now lives in nodeGather.c. This is
probably how I should have done it originally - someone, I think Jeff
Davis - complained previously that tqueue.c had no business embedding
the round-robin policy decision, and he was right. So this addresses
that complaint as well.Here is an updated version. This is rebased over recent commits, and
I added a missing check for attisdropped.
Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 19, 2015 at 12:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
special buildfarm support is required - you would just add that to the
animal's config file, more or less like this:config_env =>
{
CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
},I try to make things easy :-)
Wow, that's great. So, I'll try to rework the test code I posted
previously into something less hacky, and eventually add a #define
like this so we can run it on the buildfarm. There's a few other
things that need to get done before that really makes sense - like
getting the rest of the bug fix patches committed - otherwise any
buildfarm critters we add will just be permanently red.
OK, so after a bit more delay than I would have liked, I now have a
working set of patches that we can use to ensure automated testing of
the parallel mode infrastructure. I ended up doing something that
does not require a #define, so I'll need some guidance on what to do
on the BF side given that context. Please find attached three
patches, two of them for commit.
group-locking-v1.patch is a vastly improved version of the group
locking patch that we discussed, uh, extensively last year. I realize
that there was a lot of doubt about this approach, but I still believe
it's the right approach, I have put a lot of work into making it work
correctly, I don't think anyone has come up with a really plausible
alternative approach (except one other approach I tried which turned
out to work but with significantly more restrictions), and I'm
committed to fixing it in whatever way is necessary if it turns out to
be broken, even if that amounts to a full rewrite. Review is welcome,
but I honestly believe it's a good idea to get this into the tree
sooner rather than later at this point, because automated regression
testing falls to pieces without these changes, and I believe that
automated regression testing is a really good idea to shake out
whatever bugs we may have in the parallel query stuff. The code in
this patch is all mine, but Amit Kapila deserves credit as co-author
for doing a lot of prototyping (that ended up getting tossed) and
testing. This patch includes comments and an addition to
src/backend/storage/lmgr/README which explain in more detail what this
patch does, how it does it, and why that's OK.
force-parallel-mode-v1.patch is what adds the actual infrastructure
for automated testing. You can set force_parallel_mode=on to force
queries to be ru in a worker whenever possible; this can help test
whether your user-defined functions have been erroneously labeled as
PARALLEL SAFE. If they error out or misbehave with this setting
enabled, you should label them PARALLEL RESTRICTED or PARALLEL UNSAFE.
If you set force_parallel_mode=regress, then some additional changes
intended specifically for regression testing kick in; those changes
are intended to ensure that you get exactly the same output from
running the regression tests with the parallelism infrastructure
forcibly enabled that you would have gotten anyway. Most of this code
is mine, but there are also contributions from Amit Kapila and Rushabh
Lathia.
With both of these patches, you can create a file that says:
force_parallel_mode=regress
max_parallel_degree=2
Then you can run: make check-world TEMP_CONFIG=/path/to/aforementioned/file
If you do, you'll find that while the core regression tests pass
(whee!) the pg_upgrade regression tests fail (oops) because of a
pre-existing bug in the parallelism code introduced by neither of
these two patches. I'm not exactly sure how to fix that bug yet - I
have a couple of ideas - but I think the fact that this test code
promptly found a bug is good sign that it provides enough test
coverage to be useful. Sticking a Gather node on top of every query
where it looks safe just turns out to exercise a lot of things: the
code that decides whether it's safe to put that Gather node, the code
to launch and manage parallel workers, the code those workers
themselves run, etc. The point is just to force as much of the
parallel code to be used as possible even when it's not expected to
make anything faster.
test-group-locking-v1.patch is useful for testing possible deadlock
scenarios with the group locking patch. It's not otherwise safe to
use this, like, at all, and the patch is not proposed for commit.
This patch is entirely by Amit Kapila.
In addition to what's in these patches, I'd like to add a new chapter
to the documentation explaining which queries can be parallelized and
in what ways, what the restrictions are that keep parallel query from
getting used, and some high-level details of how parallelism "works"
in PostgreSQL from a user perspective. Things will obviously change
here as we get more capabilities, but I think we're at a point where
it makes sense to start putting this together. What I'm less clear
about is where exactly in the current SGML documentation such a new
chapter might fit; suggestions very welcome.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
group-locking-v1.patch is a vastly improved version of the group
locking patch that we discussed, uh, extensively last year. I realize
that there was a lot of doubt about this approach, but I still believe
it's the right approach, I have put a lot of work into making it work
correctly, I don't think anyone has come up with a really plausible
alternative approach (except one other approach I tried which turned
out to work but with significantly more restrictions), and I'm
committed to fixing it in whatever way is necessary if it turns out to
be broken, even if that amounts to a full rewrite. Review is welcome,
but I honestly believe it's a good idea to get this into the tree
sooner rather than later at this point, because automated regression
testing falls to pieces without these changes, and I believe that
automated regression testing is a really good idea to shake out
whatever bugs we may have in the parallel query stuff. The code in
this patch is all mine, but Amit Kapila deserves credit as co-author
for doing a lot of prototyping (that ended up getting tossed) and
testing. This patch includes comments and an addition to
src/backend/storage/lmgr/README which explain in more detail what this
patch does, how it does it, and why that's OK.
I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.
I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
group-locking-v1.patch is a vastly improved version of the group
locking patch that we discussed, uh, extensively last year. I realize
that there was a lot of doubt about this approach, but I still believe
it's the right approach, I have put a lot of work into making it work
correctly, I don't think anyone has come up with a really plausible
alternative approach (except one other approach I tried which turned
out to work but with significantly more restrictions), and I'm
committed to fixing it in whatever way is necessary if it turns out to
be broken, even if that amounts to a full rewrite. Review is welcome,
but I honestly believe it's a good idea to get this into the tree
sooner rather than later at this point, because automated regression
testing falls to pieces without these changes, and I believe that
automated regression testing is a really good idea to shake out
whatever bugs we may have in the parallel query stuff. The code in
this patch is all mine, but Amit Kapila deserves credit as co-author
for doing a lot of prototyping (that ended up getting tossed) and
testing. This patch includes comments and an addition to
src/backend/storage/lmgr/README which explain in more detail what this
patch does, how it does it, and why that's OK.I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.
Probably not very well. Do you want me to revert it?
I mean, look. Without that patch, parallel query is definitely
broken. Just revert the patch and try running the regression tests
with force_parallel_mode=regress and max_parallel_degree>0. It hangs
all over the place. With the patch, every regression test suite we
have runs cleanly with those settings. Without the patch, it's
trivial to construct a test case where parallel query experiences an
undetected deadlock. With the patch, it appears to work reliably.
Could there bugs someplace? Yes, there absolutely could. Do I really
think anybody was going to spend the time to understand deadlock.c
well enough to verify my changes? No, I don't. What I think would
have happened is that the patch would have sat around like an
albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:
1. Boy, this patch is complicated and I don't understand it. Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.
I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined
with the other patch which I also pushed, it enables *actual automated
regression testing* of the parallelism code, which I personally think
is a really good thing - and I'd like to see the buildfarm doing that
as soon as possible, so that we can find some of those bugs before
we're deep in beta. Not just bugs in group locking but all sorts of
parallelism bugs that might be revealed by end-to-end testing. The
*entire stack of patches* that began this thread was a response to
problems that were found by the automated testing that you can't do
without this patch. Those bug fixes resulted in a huge increase in
the robustness of parallel query, and that would not have happened
without this code. Every single one of those problems, some of them
in commits dating back years, was detected by the same method: run the
regression tests with parallel mode and parallel workers used for
every query for which that seems to be safe.
And, by the way, the patch, aside from the deadlock.c portion, was
posted back in October, admittedly without much fanfare, but nobody
reviewed that or any other patch on this thread. If I'd waited for
those reviews to come in, parallel query would not be committed now,
nor probably in 9.6, nor probably in 9.7 or 9.8 either. The whole
project would just be impossible on its face. It would be impossible
in the first instance if I did not have a commit bit, because there is
just not enough committer bandwidth - even reviewer bandwidth more
generally - to review the number of patches that I've submitted
related to parallelism, so in the end some, perhaps many, of those are
going to be committed mostly on the strength of my personal opinion
that committing them is better than not committing them. I am going
to have a heck of a lot of egg on my face if it turns out that I've
been too aggressive in pushing this stuff into the tree. But,
basically, the alternative is that we don't get the feature, and I
think the feature is important enough to justify taking some risk.
I think it's myopic to say "well, but this patch might have bugs".
Very true. But also, all the other parallelism patches that are
already committed or that are still under review but which can't be
properly tested without this patch might have bugs, too, so you've got
to weigh the risk that this patch might get better if I wait longer to
commit it against the possibility that not having committed it reduces
the chances of finding bugs elsewhere. I don't want it to seem like
I'm forcing this down the community's throat - I don't have a right to
do that, and I will certainly revert this patch if that is the
consensus. But that is not what I think best myself. What I think
would be better is to (1) make an effort to get the buildfarm testing
which this patch enables up and running as soon as possible and (2)
for somebody to read over the committed code and raise any issues that
they find. Or, for that matter, to read over the committed code for
any of the *other* parallelism patches and raise any issues that they
find with *that* code. There's certainly scads of code here and this
is far from the only bit that might have bugs.
Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept. Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites. You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers