SPGist "triple parity" concept doesn't work

Started by Tom Laneover 12 years ago11 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I've been looking into the problem reported at
/messages/by-id/519A5917.40704@qunar.com
and what I find is that we have spgist insertion operations deadlocking
against each other because one is descending from page A to page B while
the other descends from page B to page A. According to the README file,
the "triple parity" page selection algorithm is supposed to prevent
that:

: While descending the tree, the insertion algorithm holds exclusive lock on
: two tree levels at a time, ie both parent and child pages (parent and child
: pages can be the same, see notes above). There is a possibility of deadlock
: between two insertions if there are cross-referenced pages in different
: branches. That is, if inner tuple on page M has a child on page N while
: an inner tuple from another branch is on page N and has a child on page M,
: then two insertions descending the two branches could deadlock. To prevent
: deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
: is on page with BlockNumber N, then its child tuples should be placed on the
: same page, or else on a page with BlockNumber M where (N+1) mod 3 == M mod 3.
: This rule guarantees that tuples on page M will have no children on page N,
: since (M+1) mod 3 != N mod 3.

That would work fine as long as the invariant is maintained accurately.
However, there are at least two cases where the existing code fails to
maintain the invariant:

1. In spgAddNodeAction, if the enlarged inner tuple doesn't fit on the
current page anymore, we do this:

/*
* obtain new buffer with the same parity as current, since it will be
* a child of same parent tuple
*/
current->buffer = SpGistGetBuffer(index,
GBUF_INNER_PARITY(current->blkno),
...

That's fine as long as the parent tuple wasn't also on the current page.
If it was on the current page, we end up re-downlinking the parent to a
page having the same parity it has, not one more as it should be.

I tried to fix this like so:

/*
* get a new buffer that has the right parity to store a child of
* the current tuple's parent
*/
current->buffer = SpGistGetBuffer(index,
GBUF_INNER_PARITY(parent->blkno + 1),
...

but that just moves the problem somewhere else: the link from the parent
to the new inner tuple is now guaranteed to follow the parity rules, but
the downlinks leading out of it don't follow them anymore.

2. In spgSplitNodeAction, we split an inner tuple into a "prefix" tuple
that replaces that inner tuple, and a "postfix" tuple that contains the
same downlinks the original tuple did. That's fine as long as we can
fit the postfix tuple on the same page. If we can't, we assign it to a
page that's one parity level below the current page, and then its
outgoing links violate the parity rules. (Keeping the postfix tuple
on the current page wouldn't make things better, since we'd still
violate the parity rules with respect to either the incoming or outgoing
links of the prefix tuple, if it has to go to another page.)

With a few repetitions of either of these cases, and some bad luck
about placement of the new tuples, you get into situations where two
pages each contain downlinks leading to the other; and then a deadlock
is just a matter of time.

I don't immediately see any good way to fix this. I think the "triple
parity" rule as it stands is hopelessly broken, but I don't know what
to replace it with, even granting that we don't need to maintain on-disk
compatibility. (We'd have to tell people to reindex SPGist indexes
anyway, because of the risk that they already contain circular links;
so switching to a new layout rule doesn't seem to add any more pain.)
Or we could try to modify the insertion algorithm so it doesn't lock
two levels of the tree at once, but that seems pretty risky.

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Greg Stark
stark@mit.edu
In reply to: Tom Lane (#1)
Re: SPGist "triple parity" concept doesn't work

On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To prevent
: deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
: is on page with BlockNumber N, then its child tuples should be placed on the
: same page, or else on a page with BlockNumber M where (N+1) mod 3 == M mod 3.
: This rule guarantees that tuples on page M will have no children on page N,
: since (M+1) mod 3 != N mod 3.

Even if the invariant was maintained why doesn't that just mean you
need three concurrent inserts to create a deadlock?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#2)
Re: SPGist "triple parity" concept doesn't work

Greg Stark <stark@mit.edu> writes:

On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

: This rule guarantees that tuples on page M will have no children on page N,
: since (M+1) mod 3 != N mod 3.

Even if the invariant was maintained why doesn't that just mean you
need three concurrent inserts to create a deadlock?

Hm, good point. That reinforces my feeling that the page-number-based
approach isn't workable as a guarantee; though we might want to keep
that layout rule as a heuristic that would help reduce contention.

I thought a little bit about whether we could drop the requirement of
locking two tree levels during insertion descent, or at least recover
from deadlock if it did occur. One simple fix would be to do a
ConditionalLockBuffer on the child level, and if it fails, just abandon
the insertion attempt and start over. However that could lead to a lot
of wasted work when there's contention, so it's not terribly attractive
as-is. Another line of thought is to lock just the single parent tuple,
not its whole page, when descending --- then we can't deadlock unless
there are actual circularities in the index. We'd probably have to use
a heavyweight lock for that, which might be problematic for performance,
and I'm not exactly sure about timing of when to take that lock relative
to acquiring the page's buffer lock (which we'd still need). There are
probably some other ways to attack this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: SPGist "triple parity" concept doesn't work

Tom Lane wrote:

I don't immediately see any good way to fix this. I think the "triple
parity" rule as it stands is hopelessly broken, but I don't know what
to replace it with, even granting that we don't need to maintain on-disk
compatibility. (We'd have to tell people to reindex SPGist indexes
anyway, because of the risk that they already contain circular links;
so switching to a new layout rule doesn't seem to add any more pain.)
Or we could try to modify the insertion algorithm so it doesn't lock
two levels of the tree at once, but that seems pretty risky.

Is this the chance to add a metapage?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: SPGist "triple parity" concept doesn't work

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Is this the chance to add a metapage?

SPGist indexes do have a metapage already --- you're confusing them with
Gist.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#1)
Re: SPGist "triple parity" concept doesn't work

That would work fine as long as the invariant is maintained accurately.
However, there are at least two cases where the existing code fails to
maintain the invariant:

Hmm. Didn't catch that during development.

Thoughts?

Give me some time to play around it. Will think.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Will Crawford
billcrawford1970@gmail.com
In reply to: Tom Lane (#3)
Re: SPGist "triple parity" concept doesn't work

On 7 June 2013 02:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm, good point. That reinforces my feeling that the page-number-based
approach isn't workable as a guarantee; though we might want to keep
that layout rule as a heuristic that would help reduce contention.

Can the locks just be taken in, say, numeric order of the pages involved?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#6)
Re: SPGist "triple parity" concept doesn't work

Teodor Sigaev <teodor@sigaev.ru> writes:

Thoughts?

Give me some time to play around it. Will think.

I experimented a bit with the idea of taking a heavyweight lock to
represent the right to alter an inner tuple. The results were pretty
grim: an spgist index build example went from 225 ms to 380 ms, and
a test case involving concurrent insertions (basically the complainant's
original example pgbench-ified) went from 5400 tps to 4300 tps.
I hadn't realized our heavyweight lock code was quite that expensive :-(

Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict. That
would at least not create any performance penalty for non-concurrent
scenarios; and even in concurrent cases, I suspect you'd have to be
rather unlucky to get penalties as bad as the heavyweight-lock solution
is showing.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#8)
Re: SPGist "triple parity" concept doesn't work

Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict. That
would at least not create any performance penalty for non-concurrent
scenarios; and even in concurrent cases, I suspect you'd have to be
rather unlucky to get penalties as bad as the heavyweight-lock solution
is showing.

Agree, it would be a better workaround for now. I will be able to do this at
this friday.

I considered the idea to forbid placement of child on the same page as parent,
but this implementation a) could significantly increase size of index, b)
doesn't solve Greg's point.

We definetly need new idea of locking protocol and I'll return to this problem
at autumn (sorry, I havn't time in summer to do this research).
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#8)
1 attachment(s)
Re: SPGist "triple parity" concept doesn't work

Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict.

done, look at the patch.

I was faced with the fact that my mail is considered spam by postgresql.org, so
I repeat some hthoughts from previous mail:

I considered the idea to forbid placement of child on the same page as parent,
but this implementation a) could significantly increase size of index, b)
doesn't solve Greg's point.

We definetly need new idea of locking protocol and I'll return to this problem
at autumn (sorry, I havn't time in summer to do this research).

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

spgist_deadlock-1.patch.gzapplication/x-tar; name=spgist_deadlock-1.patch.gzDownload
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#10)
Re: SPGist "triple parity" concept doesn't work

Teodor Sigaev <teodor@sigaev.ru> writes:

Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict.

done, look at the patch.

Looks good, committed with some cosmetic adjustments.

We definetly need new idea of locking protocol and I'll return to this
problem at autumn (sorry, I havn't time in summer to do this
research).

OK. I think the performance of this way will be okay, actually, in most
cases anyhow. It'll do till we have a better idea.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers