group locking: incomplete patch, just for discussion
For parallelism, I think we need a concept of group locking. That is,
suppose we have a user backend and N worker backends collaborating to
execute some query. For the sake of argument, let's say it's a
parallel CLUSTER, requiring a full table lock. We need all of the
backends to be able to lock the table even though at least one of them
holds AccessExclusiveLock. This suggests that the backends should all
be members of a locking group, and that locks within the same locking
group should be regarded as mutually non-conflicting.
At least to me, that simple scenario is clear-cut[1]You could argue that one process instead should take locks on behalf of the group. But I think this is a terrible approach. First, it means that whichever process is holding the locks, presumably the user backend, had better not try to do any real computation of its own, so that it can timely service lock requests, which seems like a huge loss of efficiency; if the optimum degree of parallelism is 2, we'll need 50% more processes and the whole result set will have to be copied an extra time. Yuck. Second, it means that the lock-holding process must refuse to die until all of the processes on behalf of which it is holding locks don't need them any more. And I hate backend processes that refuse to die with a fiery passion that cannot be quenched., but what do we
do in more complicated situations? For example, suppose backends A
and B are members of the same locking group. A locks a relation with
AccessShareLock, an unrelated process X queues up waiting for an
AccessExclusiveLock, and then B also requests AccessShareLock. The
normal behavior here is that B should wait for X to go first, but here
that's a problem. If A is the user backend and B is a worker backend,
A will eventually wait for B, which is waiting for X, which is waiting
for A: deadlock. If it's the other way around, we might survive, but
at the very best it's undesirable to have "parallelism" where A and B
are actually completely serialized, with X taking its turn in the
middle. Now, in practical cases, we probably expect that if A and B
lock the same relation, they're going to do so in very quick
succession, so the actual behavior of the lock manager shouldn't
matter much in terms of fairness. But since we can't predict what
other processes may do in the window between those lock acquisitions,
a sound theoretical approach is needed to avoid deadlocks.
After some thought I've come to the conclusion that we should assume
that every process in a locking group will eventually wait for every
other process in that locking group. It follows that once any single
process in a locking group obtains any lock on an object, any future
lock request by another lock group member on that object should, if
non-conflicting, be granted immediately. It also follows that, when
processes in a locking group are waiting for a lock on an object, none
should be awoken until all lock requests have been satisfied. For
example, suppose process X holds AccessExclusiveLock on a relation.
First, process Y waits for AccessShareLock. Then, processes A and B,
members of the same locking group, queue up respectively for
AccessShareLock and AccessExclusiveLock, in that order. Next, process
X releases its lock. At this point, we should wake Y *but not A*; A
should continue to wait until we can grant locks to both A and B. The
reason is that a new process Q might come along and, for purposes of
deadlock avoidance, we might need to reorder the lock queue. Putting
Q ahead of both A and B is fine; but putting it between A and B is bad
for the reasons discussed in the previous paragraph.
Attached is an incomplete draft patch. It modifies
LockAcquireExtended() and ProcSleep() but not ProcWakeup() or the
deadlock detector. Aside from being incomplete, which is a problem in
and of itself, I don't have a very good idea how to comprehensively
test something like this. I can of course cobble together some test
code for particular scenarios, but it's not at all clear to me what a
comprehensive test plan would look like. Any ideas on that point
would be particularly appreciated, as would feedback on the overall
design. A lot more work is clearly needed here, but I've been known
to advocate soliciting feedback on proposed patches early, so here I
am taking my own advice.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1]: You could argue that one process instead should take locks on behalf of the group. But I think this is a terrible approach. First, it means that whichever process is holding the locks, presumably the user backend, had better not try to do any real computation of its own, so that it can timely service lock requests, which seems like a huge loss of efficiency; if the optimum degree of parallelism is 2, we'll need 50% more processes and the whole result set will have to be copied an extra time. Yuck. Second, it means that the lock-holding process must refuse to die until all of the processes on behalf of which it is holding locks don't need them any more. And I hate backend processes that refuse to die with a fiery passion that cannot be quenched.
behalf of the group. But I think this is a terrible approach. First,
it means that whichever process is holding the locks, presumably the
user backend, had better not try to do any real computation of its
own, so that it can timely service lock requests, which seems like a
huge loss of efficiency; if the optimum degree of parallelism is 2,
we'll need 50% more processes and the whole result set will have to be
copied an extra time. Yuck. Second, it means that the lock-holding
process must refuse to die until all of the processes on behalf of
which it is holding locks don't need them any more. And I hate
backend processes that refuse to die with a fiery passion that cannot
be quenched.
Attachments:
group-locking-v0.patchtext/x-patch; charset=US-ASCII; name=group-locking-v0.patchDownload+497-96
Robert Haas <robertmhaas@gmail.com> writes:
For parallelism, I think we need a concept of group locking. That is,
suppose we have a user backend and N worker backends collaborating to
execute some query. For the sake of argument, let's say it's a
parallel CLUSTER, requiring a full table lock. We need all of the
backends to be able to lock the table even though at least one of them
holds AccessExclusiveLock. This suggests that the backends should all
be members of a locking group, and that locks within the same locking
group should be regarded as mutually non-conflicting.
In the background worker case, I imagined that the foreground process
would hold a lock and the background processes would just assume they
could access the table without holding locks of their own. Aren't
you building a mountain where a molehill would do?
[1] You could argue that one process instead should take locks on
behalf of the group. But I think this is a terrible approach. First,
it means that whichever process is holding the locks, presumably the
user backend, had better not try to do any real computation of its
own, so that it can timely service lock requests, which seems like a
huge loss of efficiency;
What is "timely service lock requests"? You got the lock before firing
off the background workers, you hold it till they're done.
This is a truly horrible excuse for the amount of complexity (and,
without doubt, bugs and performance penalties) you propose to load onto
the lock mechanism.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 15, 2014 at 12:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
What is "timely service lock requests"? You got the lock before firing
off the background workers, you hold it till they're done.
If you want to run any non-trivial (read: useful) code in the
background workers, it rapidly gets very hard to predict which
relation locks they might need, and infeasible to hold them for the
lifetime of the computation. For example, suppose we restrict
background workers to running only operators found in btree opclasses.
That's a far more draconian restriction than we'd like to have, but
perhaps liveable in a pinch. But bttextcmp() uses
PG_GETARG_TEXT_PP(), which means it may (or may not) need to lock the
TOAST table; and it can indirectly call lookup_collation_cache() which
does SearchSysCache1(COLLOID, ...) which may result in scanning
pg_collation. And enum_cmp_internal() will do
SearchSysCache1(ENUMOID, ...) which may result in scanning pg_enum.
There's obviously a balance to be struck, here. It will never be safe
to run just anything in a background worker, and we'll go crazy if we
try to make that work. On the other hand, if there's essentially no
code that can run in a background worker without extensive
modification, we might as well not bother implementing parallelism in
the first place. I admit that getting group locking is far from
simple, but I also don't believe it's as complicated as previous
projects like SSI, logical decoding, or LATERAL.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 October 2014 05:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
For parallelism, I think we need a concept of group locking. That is,
suppose we have a user backend and N worker backends collaborating to
execute some query. For the sake of argument, let's say it's a
parallel CLUSTER, requiring a full table lock. We need all of the
backends to be able to lock the table even though at least one of them
holds AccessExclusiveLock. This suggests that the backends should all
be members of a locking group, and that locks within the same locking
group should be regarded as mutually non-conflicting.In the background worker case, I imagined that the foreground process
would hold a lock and the background processes would just assume they
could access the table without holding locks of their own. Aren't
you building a mountain where a molehill would do?
Yeh. Locks should be made in the name of the main transaction and
released by it.
When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.
So all lock requests held until EOX should be made in the name of the
top level process. Any child process wanting a lock should request it,
but on discovering it is already held at parent level should just
update the local lock table. Transient locks, like catalog locks can
be made and released locally; I think there is more detail there but
it shouldn't affect the generalisation.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 15, 2014 at 4:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 15 October 2014 05:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
For parallelism, I think we need a concept of group locking. That is,
suppose we have a user backend and N worker backends collaborating to
execute some query. For the sake of argument, let's say it's a
parallel CLUSTER, requiring a full table lock. We need all of the
backends to be able to lock the table even though at least one of them
holds AccessExclusiveLock. This suggests that the backends should all
be members of a locking group, and that locks within the same locking
group should be regarded as mutually non-conflicting.In the background worker case, I imagined that the foreground process
would hold a lock and the background processes would just assume they
could access the table without holding locks of their own. Aren't
you building a mountain where a molehill would do?Yeh. Locks should be made in the name of the main transaction and
released by it.When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.So all lock requests held until EOX should be made in the name of the
top level process. Any child process wanting a lock should request it,
but on discovering it is already held at parent level should just
update the local lock table. Transient locks, like catalog locks can
be made and released locally; I think there is more detail there but
it shouldn't affect the generalisation.
Hmm, interesting idea. Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction. The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process. Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.
Figuring out what to do about that is really the heart of this
project, I think, and there are a variety of designs possible. One of
the early ideas that I had was to the parallel workers directly
twaddle the main processes' PGPROC and lock table state. In other
words, instead of taking locks using their own PGPROCs, everybody uses
a single PGPROC. I made several attempts at getting designs along
these lines off the ground, but it got complicated and invasive: (1)
The processes need to coordinate to make sure that you don't have two
people twaddling the lock state at the same time; (2) The existing
data structures won't support more than one process waiting at a time,
but there's no reason why one parallel worker couldn't be trying to
lock one catalog while another one is trying to lock a different
catalog; (3) On a related note, when a lock wait ends, you can't just
wake up the process that owns the PGPROC, but rather the one that's
actually waiting; (4) the LWLockReleaseAll algorithm just falls apart
in this environment, as far as I can see.
The alternative design which I've been experimenting with is to have
each process use its own PGPROC and PROCLOCK structures, but to tag
each PROCLOCK with not only the owning PGPROC but also the group
leader's PGPROC. This has not been entirely smooth sailing, but it
sees to break much less code than trying to have everybody use one
PGPROC. Most of the changes that seem to be needed to make things
work are pretty well-isolated; rather than totally rearranging the
lock manager, you're just adding extra code that runs only in the
parallel case.
I'm definitely open to the idea that there's a better, simpler design
out there, but I haven't been able to think of one that doesn't break
deadlock detection.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 October 2014 14:46, Robert Haas <robertmhaas@gmail.com> wrote:
When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.
Hmm, interesting idea. Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction. The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process. Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.
Family disputes are fairly easily resolved ;-)
The first and basic point is that in most cases the parent should
already hold the required locks. This can only happen for briefly held
locks and/or more complex stuff. In the first case, getting
parallelism to work without that complex stuff would be useful. I'd be
happy if the first version simply throws an error if a child can't
acquire a lock immediately. Don't overthink the first version. Knowing
you'll disagree, lets take a further step...
Second point, the relationship between parent and children is clear.
If we do a deadlock detection, we should be able to search for that as
a special case, since we will know that we are a child and that such a
situation might occur. So just add in an edge so the rest of the
deadlock code works fine.
If that doesn't work, use a heurisic. If parent is waiting when child
does deadlock test, assume its a deadlock and abort the child
speculatively just in case. You can work out how to do that better in
the future, since it won't happen that often.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 15 October 2014 14:46, Robert Haas <robertmhaas@gmail.com> wrote:
When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.Hmm, interesting idea. Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction. The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process. Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.Family disputes are fairly easily resolved ;-)
The first and basic point is that in most cases the parent should
already hold the required locks. This can only happen for briefly held
locks and/or more complex stuff. In the first case, getting
parallelism to work without that complex stuff would be useful. I'd be
happy if the first version simply throws an error if a child can't
acquire a lock immediately. Don't overthink the first version. Knowing
you'll disagree, lets take a further step...
Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait. I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait. I also think we are in agreement that at least SOME
thought about lock management is needed; the question we're trying to
hash out is whether what I'm proposing to try to do here is
significantly more ambitious than what's really necessary for V1.
Which is a good question.
Second point, the relationship between parent and children is clear.
If we do a deadlock detection, we should be able to search for that as
a special case, since we will know that we are a child and that such a
situation might occur. So just add in an edge so the rest of the
deadlock code works fine.If that doesn't work, use a heurisic. If parent is waiting when child
does deadlock test, assume its a deadlock and abort the child
speculatively just in case. You can work out how to do that better in
the future, since it won't happen that often.
Well, the deadlock checker needs to work not only when one of the
parallel processes invokes it, but also when somebody else invokes it.
For example, suppose we have parallel processes A and B. A holds lock
1 and awaits lock 2, while B holds awaits lock 3. No problem! Now
process X, which is holding lock 2 tries to grab lock 1. X must be
able to detect the deadlock. Now, that's not necessarily a problem
with what you said: it just means that the parent-child relationship
has to be clear from the contents of shared memory.
Which is pretty much the direction I'm aiming with the incomplete
patch I posted. For the sake of simplicity, I want to assume that
every process in a locking group waits for every other process in the
same locking group, even though that might not be technically true in
every case. If the parent is waiting for a lock and the guy who has
that lock is waiting for a parallel worker in the same group, my plan
(which I think matches your suggestion) is to call that a deadlock.
There are a few sticky points here: sometimes, the deadlock checker
doesn't actually abort transactions, but just rearranges the wait
queue, so something at least somewhat sensible needs to happen in that
case.
The thing that I'm aiming to do in the patch which I think you are
suggesting might not be necessary is to make it possible for the child
go ahead and request AccessShareLock on the scan relation even though
the parent might already hold some other lock (perhaps even
AccessExclusiveLock). I want to make the lock manager smart enough
to recognize that those locks are mutually non-conflicting because of
the fact that the two processes are in close cooperation. Clearly, we
could get by without that, but it adds complexity in other places: the
parent has to never release its lock (even if killed) until the child
is done with the relation; and the scan code itself needs to be
conditional, passing NoLock from children and some other mode in the
parent. That's all manageable, but it looks to me like doing the
necessary surgery on the lock manager isn't actually going to be that
hard; most of the necessary logic is present in draft form in the
patch already, and it's not that complex.
In any design, the hard part, at least as far as I can see, is making
the deadlock detector work reliably. I agree with you that it's OK
if, in some unlikely situation, it occasionally diagnoses a deadlock
between parallel processes where perhaps there isn't one. But it's
not OK, at least in my opinion, if it ever fails to detect a real
deadlock.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 October 2014 17:03, Robert Haas <robertmhaas@gmail.com> wrote:
Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait. I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait.
I'm guessing we might differ slightly on what constitutes as simple as possible.
Something usable, with severe restrictions, is actually better than we
have now. I understand the journey this work represents, so don't be
embarrassed by submitting things with heuristics and good-enoughs in
it. Our mentor, Mr.Lane, achieved much by spreading work over many
releases, leaving others to join in the task.
Might I gently enquire what the "something usable" we are going to see
in this release? I'm not up on current plans.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 15 October 2014 17:03, Robert Haas <robertmhaas@gmail.com> wrote:
Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait. I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait.I'm guessing we might differ slightly on what constitutes as simple as possible.
Yes, I believe there have been occasions in the past when that has
happened, so definitely possible. :-)
Something usable, with severe restrictions, is actually better than we
have now. I understand the journey this work represents, so don't be
embarrassed by submitting things with heuristics and good-enoughs in
it. Our mentor, Mr.Lane, achieved much by spreading work over many
releases, leaving others to join in the task.Might I gently enquire what the "something usable" we are going to see
in this release? I'm not up on current plans.
I don't know how far I'm going to get for this release yet. I think
pg_background is a pretty good milestone, and useful in its own right.
I would like to get something that's truly parallel working sooner
rather than later, but this group locking issue is one of 2 or 3
significant hurdles that I need to climb over first.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 October 2014 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
Might I gently enquire what the "something usable" we are going to see
in this release? I'm not up on current plans.I don't know how far I'm going to get for this release yet. I think
pg_background is a pretty good milestone, and useful in its own right.
I would like to get something that's truly parallel working sooner
rather than later, but this group locking issue is one of 2 or 3
significant hurdles that I need to climb over first.
pg_background is very cute, but really its not really a step forward,
or at least very far. It's sounding like you've already decided that
is as far as we're going to get this release, which I'm disappointed
about.
Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.
You asked for my help, but I'd like to see some concrete steps towards
an interim feature so I can see some benefit in a clear direction.
Can we please have the first step we discussed? Parallel CREATE INDEX?
(Note the please)
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/14, 3:48 PM, Simon Riggs wrote:
Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.
What do you see as being missing for autonomous transactios?
BTW, what I think would make this feature VERY useful is if it provided the ability to fire something up in another backend and leave it running in the background. I think you can do that with FDW, but then you have the authentication PITA to deal with (and perhaps pg_background is a more efficient way to move data between backends than FDW, but that's just a guess...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 28, 2014 at 4:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 16 October 2014 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
Might I gently enquire what the "something usable" we are going to see
in this release? I'm not up on current plans.I don't know how far I'm going to get for this release yet. I think
pg_background is a pretty good milestone, and useful in its own right.
I would like to get something that's truly parallel working sooner
rather than later, but this group locking issue is one of 2 or 3
significant hurdles that I need to climb over first.pg_background is very cute, but really its not really a step forward,
or at least very far. It's sounding like you've already decided that
is as far as we're going to get this release, which I'm disappointed
about.Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.You asked for my help, but I'd like to see some concrete steps towards
an interim feature so I can see some benefit in a clear direction.Can we please have the first step we discussed? Parallel CREATE INDEX?
(Note the please)
What I've been thinking about trying to work towards is parallel
sequential scan. I think that it would actually be pretty easy to
code up a mostly-working version using the existing infrastructure,
but the patch would be rejected with a bazooka, because the
non-working parts would include things like:
1. The cooperating backends might not all be using the same snapshot,
because that requires sharing the snapshot, combo CID hash, and
transaction state.
2. The quals that got pushed down to the workers might not return the
same answers that they would have produced with a single backend,
because we have no mechanism for assessing pushdown-safety.
3. Deadlock detection would be to some greater or lesser degree
broken, the details depending on the implementation choices you made.
There is a bit of a chicken-and-egg problem here. If I submit a patch
for parallel sequential scan, it'll (justifiably) get rejected because
it doesn't solve those problems. So I'm trying to solve those
above-enumerated problems first, with working and at least
somewhat-useful examples that show how the incremental bits of
infrastructure can be used to do stuff. But that leads to your
(understandable) complaint that this isn't very real yet.
Why am I now thinking about parallel sequential scan instead of
parallel CREATE INDEX? You may remember that I posted a patch for a
new memory allocator some time ago, and it came in for a fair amount
of criticism and not much approbation. Some of that criticism was
certainly justified, and perhaps I was as hard on myself as anyone
else was. However you want to look at it, I see the trade-off between
parallel sort and parallel seq-scan this way: parallel seq-scan
requires dealing with the planner (ouch!) but parallel sort requires
dealing with memory allocation in dynamic shared memory segments
(ouch!). Both of them require solving the three problems listed
above.
And maybe a few others, but I think those are the big ones - and I
think proper deadlock detection is the hardest of them. A colleague
of mine has drafted patches for sharing snapshots and combo CIDs
between processes, and as you might expect that's pretty easy.
Sharing the transaction state (so we can test whether a transaction ID
is "our" transaction ID inside the worker) is a bit trickier, but I
think not too hard. Assessing pushdown-safety will probably boil down
to adding some equivalent of proisparallel. Maybe not the most
elegant, but defensible, and if you're looking for the shortest path
to something usable, that's probably it. But deadlock detection ...
well, I don't see any simpler solution than what I'm trying to build
here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 10/28/14, 3:48 PM, Simon Riggs wrote:
Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.What do you see as being missing for autonomous transactios?
Personally, I don't see this patch set as having much to do with real
autonomous transactions.
BTW, what I think would make this feature VERY useful is if it provided the
ability to fire something up in another backend and leave it running in the
background.
You *can* do that. I mean, long-running transactions will have their
usual problems, but if you want to kick off a long-running (or a
short-running query) in the background and forget about it, this patch
lets you do that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 28, 2014 at 7:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 10/28/14, 3:48 PM, Simon Riggs wrote:
Given your description of pg_background it looks an awful lot like
infrastructure to make Autonomous Transactions work, but it doesn't
even do that. I guess it could do in a very small additional patch, so
maybe it is useful for something.What do you see as being missing for autonomous transactios?
Personally, I don't see this patch set as having much to do with real
autonomous transactions.BTW, what I think would make this feature VERY useful is if it provided the
ability to fire something up in another backend and leave it running in the
background.You *can* do that. I mean, long-running transactions will have their
usual problems, but if you want to kick off a long-running (or a
short-running query) in the background and forget about it, this patch
lets you do that.
Err, sorry. pg_background lets you do that, not this patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28 October 2014 23:24, Robert Haas <robertmhaas@gmail.com> wrote:
You asked for my help, but I'd like to see some concrete steps towards
an interim feature so I can see some benefit in a clear direction.Can we please have the first step we discussed? Parallel CREATE INDEX?
(Note the please)What I've been thinking about trying to work towards is parallel
sequential scan. I think that it would actually be pretty easy to
code up a mostly-working version using the existing infrastructure,
but the patch would be rejected with a bazooka, because the
non-working parts would include things like:1. The cooperating backends might not all be using the same snapshot,
because that requires sharing the snapshot, combo CID hash, and
transaction state.
2. The quals that got pushed down to the workers might not return the
same answers that they would have produced with a single backend,
because we have no mechanism for assessing pushdown-safety.
3. Deadlock detection would be to some greater or lesser degree
broken, the details depending on the implementation choices you made.There is a bit of a chicken-and-egg problem here. If I submit a patch
for parallel sequential scan, it'll (justifiably) get rejected because
it doesn't solve those problems. So I'm trying to solve those
above-enumerated problems first, with working and at least
somewhat-useful examples that show how the incremental bits of
infrastructure can be used to do stuff. But that leads to your
(understandable) complaint that this isn't very real yet.Why am I now thinking about parallel sequential scan instead of
parallel CREATE INDEX? You may remember that I posted a patch for a
new memory allocator some time ago, and it came in for a fair amount
of criticism and not much approbation. Some of that criticism was
certainly justified, and perhaps I was as hard on myself as anyone
else was. However you want to look at it, I see the trade-off between
parallel sort and parallel seq-scan this way: parallel seq-scan
requires dealing with the planner (ouch!) but parallel sort requires
dealing with memory allocation in dynamic shared memory segments
(ouch!). Both of them require solving the three problems listed
above.And maybe a few others, but I think those are the big ones - and I
think proper deadlock detection is the hardest of them. A colleague
of mine has drafted patches for sharing snapshots and combo CIDs
between processes, and as you might expect that's pretty easy.
Sharing the transaction state (so we can test whether a transaction ID
is "our" transaction ID inside the worker) is a bit trickier, but I
think not too hard. Assessing pushdown-safety will probably boil down
to adding some equivalent of proisparallel. Maybe not the most
elegant, but defensible, and if you're looking for the shortest path
to something usable, that's probably it. But deadlock detection ...
well, I don't see any simpler solution than what I'm trying to build
here.
OK, I see where you are headed, and perhaps why.
I said I would help and I mean to do so. Here's my thoughts:
Solving your 3 problems requires significant research, coding and
agreement that you're unlikely to get in 9.5 because there are no
solutions happening at the end of it; its all theoretical design,
which becomes more arguable. And even if you do, Parallel (||) Seq
Scan isn't interesting except for marketing purposes, since its use in
real queries is limited because most queries need to do something else
after the SeqScan, which opens up a fuller discussion of parallel
query which we've never had. My feeling is that releasing only || Seq
Scan would backfire since we don't yet do enough other stuff to be a
full solution.
If you do wish to pursue || Seq Scan, then a working prototype would
help. It allows us to see that there is an open source solution we are
working to solve the problems for. People can benchmark it, understand
the benefits and issues it raises and that would help focus attention
on the problems you are trying to solve in infrastructure. People may
have suggestions on how to solve or avoid those that you hadn't
thought of.
Personally, IMHO, its quite late in the cycle to be releasing such a
prototype since we have only 6-7 weeks until the major patch
submission deadline. Hence my request: can we get *something* good for
users into 9.5, since with respect, infrastructure isn't useful. (If
you take such comments negatively, we might discuss similar issues
with BDR - but we are working to make simple features available in
each release on that project also).
My proposal is we do a parallel index build scan... just as we
discussed earlier, so you would be following the direction set by Dev
Meeting, not just a proposal of mine.
As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, sort
them to produce N files and then merge the files using existing code.
That only applies to large sorts, but then those are the ones you
cared about doing in parallel anyway.
CREATE INDEX has a lock on the table. Worker tasks can be simply
banned from acquiring new locks and doing many other tasks. We can
prevent transaction chains, so both normal and concurrent versions
have no complex combocid or transaction state issues. So your 3
problems come down to nothing that needs to be solved immediately.
Pushdown can be assumed for the functions we use for existing index
types: we can verify this from our internal code, so we are not at
risk there.
So my proposal is we do a parallel index build scan, a limited subset
of parallel seq scan, we put the data from that into a private sort
node, then output a series of files. The plan for this is the
following...
Parallel Sort Merge
Local Sort
Local IndexBuildScanSegment
Local IndexBuildScanSegment scans a portion of the table being
indexed, a "segment" - this is the heart of the division of labour
within the parallel scan.
Local Sort is just a normal sort node except it outputs a file, not a
stream of sorted tuples.
Parallel Sort Merge waits for all Local Sorts to finish and then
merges the inputs using existing sort merge logic.
Now I'm sure you can see there are ways to improve that plan using
shmem. But the above is a workable solution for 9.5 within the time
available. We'll learn much making this happen and it will fuel
everyone for more work in the following release.
For the next release, I suggest that || hash join is more useful goal
than || seq scan since it allows typical queries on big tables to be
optimized, but we can discuss that in more detail another time.
Happy to have a more detailed phone call or meeting to discuss.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
My proposal is we do a parallel index build scan... just as we
discussed earlier, so you would be following the direction set by Dev
Meeting, not just a proposal of mine.As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, sort
them to produce N files and then merge the files using existing code.
That only applies to large sorts, but then those are the ones you
cared about doing in parallel anyway.CREATE INDEX has a lock on the table. Worker tasks can be simply
banned from acquiring new locks and doing many other tasks.
Banning worker tasks from taking locks is only possible if the
worker backend doesn't need to acquire any lock which is not
already acquired by main backend, but can we safely assume
the same? One example as taken by Robert upthread about
bttextcmp() can break that assumption.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 29, 2014 at 4:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
If you do wish to pursue || Seq Scan, then a working prototype would
help. It allows us to see that there is an open source solution we are
working to solve the problems for. People can benchmark it, understand
the benefits and issues it raises and that would help focus attention
on the problems you are trying to solve in infrastructure. People may
have suggestions on how to solve or avoid those that you hadn't
thought of.
I've mulled that over a bit and it might be worth pursuing further.
Of course there's always the trade-off: doing that means not doing
something else.
As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, sort
them to produce N files and then merge the files using existing code.
That only applies to large sorts, but then those are the ones you
cared about doing in parallel anyway.
A simple implementation of this would work only for simple
pass-by-value types, like integers. Pass-by-reference types require
the comparator to de-TOAST, and some other types require catalog
lookups. I don't think that's very useful: Noah previously did some
analysis of this problem and concluded (with apologies if I'm remember
the details incorrectly here) that the comparator for strings was
something like 1000x as expensive as the comparator for integers, and
that you basically couldn't get the latter to take enough time to be
worth parallelizing.
I care much more about getting the general infrastructure in place to
make parallel programming feasible in PostgreSQL than I do about
getting one particular case working. And more than feasible: I want
it to be relatively straightforward. That's not simple, but the
potential rewards are great. Let's face it: there are people here who
are much better than I am at hacking on the planner and especially the
executor than I am. Why haven't any of those people implemented
parallel anything? I think it's because, right now, it's just too
darn hard. I'm trying to reduce that to something approaching the
difficulty of writing normal PostgreSQL backend code, and I think I'm
6-12 patches away from that. This is one of them and, yeah, it's not
done, and, yeah, we might not get to parallel anything this release
and, yeah, things would be going faster if I could work on parallelism
full time. But I think that the progress we are making is meaningful
and the goal is within sight.
I appreciate that you'd probably attack this problem from a different
direction than I'm attacking it from, but I still think that what I'm
trying to do is a legitimate direction of attack which, by the way,
does not preclude anybody else from attacking it from a different
direction and, indeed, such a development would be most welcome.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29 October 2014 12:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
My proposal is we do a parallel index build scan... just as we
discussed earlier, so you would be following the direction set by Dev
Meeting, not just a proposal of mine.As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, sort
them to produce N files and then merge the files using existing code.
That only applies to large sorts, but then those are the ones you
cared about doing in parallel anyway.CREATE INDEX has a lock on the table. Worker tasks can be simply
banned from acquiring new locks and doing many other tasks.Banning worker tasks from taking locks is only possible if the
worker backend doesn't need to acquire any lock which is not
already acquired by main backend, but can we safely assume
the same? One example as taken by Robert upthread about
bttextcmp() can break that assumption.
I think its reasonable to imagine some prep work will be required in
the main task before workers begin.
Locking the toast table of any main tables we access seems easily
done. Though perhaps we should make weak locking of the toast table
presumed. Do we have cases where the toast table can be accessed when
the main table is not also strong locked first?
As for locking the enums table or collation table, that's simple stuff
also. They're just AccessShareLocks.
I can't imagine we'll be able to presume that all functions of every
kind are parallel-safe.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29 October 2014 12:28, Robert Haas <robertmhaas@gmail.com> wrote:
I care much more about getting the general infrastructure in place to
make parallel programming feasible in PostgreSQL than I do about
getting one particular case working. And more than feasible: I want
it to be relatively straightforward. That's not simple, but the
potential rewards are great. Let's face it: there are people here who
are much better than I am at hacking on the planner and especially the
executor than I am. Why haven't any of those people implemented
parallel anything? I think it's because, right now, it's just too
darn hard. I'm trying to reduce that to something approaching the
difficulty of writing normal PostgreSQL backend code, and I think I'm
6-12 patches away from that. This is one of them and, yeah, it's not
done, and, yeah, we might not get to parallel anything this release
and, yeah, things would be going faster if I could work on parallelism
full time. But I think that the progress we are making is meaningful
and the goal is within sight.
The role of an immediate working solution is to prove the
infrastructure so far is genuinely useful, helping also to build
interest and understanding.
There is a real danger that your "ta-dah" moment sometime in the
future contains flaws which need to be addressed, but we now have
piles of questionable infrastructure lieing around. If you have
similar doubts about anything I'm doing, please just ask.
If we could see the 6-12 patch descriptions and understand where you
are going it would help.
I appreciate that you'd probably attack this problem from a different
direction than I'm attacking it from, but I still think that what I'm
trying to do is a legitimate direction of attack which, by the way,
does not preclude anybody else from attacking it from a different
direction and, indeed, such a development would be most welcome.
Well, I don't have time; been a little busy these last 10 years and
for at least one year more yet before projects open up again.
If I did, I don't think it would be that useful to interfere.
Cooperation seems better use of my time, if possible.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 29, 2014 at 10:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
There is a real danger that your "ta-dah" moment sometime in the
future contains flaws which need to be addressed, but we now have
piles of questionable infrastructure lieing around. If you have
similar doubts about anything I'm doing, please just ask.
I think the implication that the infrastructure created to date is of
questionable utility for parallelism is false and unjustifed. Every
infrastructure patch thus far committed, and every patch proposed
other than this one, has been accompanied by code that demonstrates
what it does and how that's useful. Most of them have had enough
interest by extension authors to generate bug reports from other
people - most notably Petr, who has obviously been playing with all of
the existing modules enough to notice bugs and come to conclusions
about the shortcomings of the existing technology similar to my own.
Interest in dynamic background workers seems to go considerably
further, to the point where it made 9.4's list of major features.
If we could see the 6-12 patch descriptions and understand where you
are going it would help.
OK, here's what I think we need (roughly):
1. The remaining pg_background patches. The main things in there are
(a) the ability to propagate errors from a background worker back to
the user backend and (b) the ability to synchronize the background
worker's GUC state with the user backend.
2. Group locking.
3. Patches to synchronize the (a) snapshot, (b) combo CID hash, and
(c) transaction state from the user backend to a background worker.
Draft patches for (a) and (b) are already completed yet, but not
published because I didn't think anyone would be very excited about
them just yet.
4. A patch to add "parallel mode" in which a whole bunch of things are
prohibited in the user backend and even more things are prohibited in
the background worker - specifically, things that won't be safe in
those contexts. Noah and Amit Kapila worked on this and we have draft
patches for this as well, also not published and for the same reasons.
5. A patch to add a "proisparallel" flag or similar so that we can
label parallel-safe functions.
6. A patch to add a way for a background worker that hits an error to
kick the user backend, probably via the ProcSignal mechanism, so that
the user doesn't have to poll the shm_mq structures it's using to
communicate with those backends for errors continuously (or
alternative sit idle while those background workers do all the work).
This might be something that can be skipped for early versions, but
eventually I think we'll need it.
7. Eventually, a dynamic shared memory allocator. A lot of things can
be done without this, but some can't.
I think that's most of it. No doubt we'll discover a few other needs
along the way, but if we had that infrastructure, then I think it
would be quite feasible for an experienced hacker to parallelize most
of what we want to parallelize without having to invent too much
that's really fundamental.
If I did, I don't think it would be that useful to interfere.
Cooperation seems better use of my time, if possible.
I am not asking anyone to interfere, in the sense of getting in the
way, but I certainly think it would be useful for people to contribute
code and/or reviews.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers