Re: Debugging deadlocks

Started by Guy Rouillieralmost 21 years ago12 messages
#1Guy Rouillier
guyr@masergy.com

Alvaro Herrera wrote:

Now this can't be applied right away because it's easy to run "out of
memory" (shared memory for the lock table). Say, a delete or update
that touches 10000 tuples does not work. I'm currently working on a
proposal to allow the lock table to spill to disk ...

While not always true, in many cases the cardinality of the referenced
(parent) table is small compared to that of the referencing (child)
table. Does locking require a separate lock record for each tuple in
the child table, or just one for each tuple in the parent table with a
reference count? For example, the scenario I started this thread with
had two child tables referencing rows in a common parent table. For a
given parent tuple, a single "prevent write" lock with a reference count
would seem sufficient.

--
Guy Rouillier

#2Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Guy Rouillier (#1)

On Thu, Mar 31, 2005 at 06:54:31PM -0600, Guy Rouillier wrote:

Alvaro Herrera wrote:

Now this can't be applied right away because it's easy to run "out of
memory" (shared memory for the lock table). Say, a delete or update
that touches 10000 tuples does not work. I'm currently working on a
proposal to allow the lock table to spill to disk ...

While not always true, in many cases the cardinality of the referenced
(parent) table is small compared to that of the referencing (child)
table. Does locking require a separate lock record for each tuple in
the child table, or just one for each tuple in the parent table with a
reference count?

Just one. (LOCALLOCK, which is private to each backend, stores how many
times we hold a lock.)

I just realized we not only need to be able to spill LOCK struct to
disk, but also PROCLOCK ... am I right?

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con
ciervos incendi�ndose", y el computador dir� "especifique el tipo de ciervo"
(Jason Alexander)

#3Paul Tillotson
pntil@shentel.net
In reply to: Alvaro Herrera (#2)

Alvaro,

I suppose there must be reasons not to do this, but have you considered
using the "slack" space (empty space) in an ordinary table "heap" page
to store share-locks on the tuples in that page? (If not enough space
is available then you would still need to use the spilled-to-disk btree
structure.)

Maybe there could also be something that you put in the disk page that
means "transaction X has all the tuples in this page share-locked,"
since I imagine it will usually be the case that if very many of them
are locked, then they are all locked.

With this method, checking for a lock would be slightly more complicated
since you would need to check the disk page in which the tuple resides
first, and the spill-to-disk structure afterwards.

It seems that if this would work, you would be able to share-lock every
tuple in a table (or a large fraction of them) without a lot of extra IO
provided that the pages were not 100 % full. (Which is almost never the
case in postgresql anyway.)

Paul Tillotson

Alvaro Herrera wrote:

Show quoted text

On Thu, Mar 31, 2005 at 06:54:31PM -0600, Guy Rouillier wrote:

Alvaro Herrera wrote:

Now this can't be applied right away because it's easy to run "out of
memory" (shared memory for the lock table). Say, a delete or update
that touches 10000 tuples does not work. I'm currently working on a
proposal to allow the lock table to spill to disk ...

While not always true, in many cases the cardinality of the referenced
(parent) table is small compared to that of the referencing (child)
table. Does locking require a separate lock record for each tuple in
the child table, or just one for each tuple in the parent table with a
reference count?

Just one. (LOCALLOCK, which is private to each backend, stores how many
times we hold a lock.)

I just realized we not only need to be able to spill LOCK struct to
disk, but also PROCLOCK ... am I right?

#4Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Paul Tillotson (#3)

On Fri, Apr 01, 2005 at 07:41:54PM -0500, Paul Tillotson wrote:

Hi,

I suppose there must be reasons not to do this, but have you considered
using the "slack" space (empty space) in an ordinary table "heap" page
to store share-locks on the tuples in that page? (If not enough space
is available then you would still need to use the spilled-to-disk btree
structure.)

No, I hadn't considered it. However this seems hard to do, because the
"slack space" does not have a start point; on each page, elements
(tuples) grow from the back to the front, while element pointers grow
in the reverse direction. So I don't see how would this be done.

I've been discussing this with Bruce who has an idea completely
different from mine (basically to extend the current tuple locking
system instead of extending the lock manager). I'm still unsure about
the whole thing.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Ni aun el genio muy grande llegar�a muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)

#5Paul Tillotson
pntil@shentel.net
In reply to: Alvaro Herrera (#4)

I suppose there must be reasons not to do this, but have you considered
using the "slack" space (empty space) in an ordinary table "heap" page
to store share-locks on the tuples in that page? (If not enough space
is available then you would still need to use the spilled-to-disk btree
structure.)

No, I hadn't considered it. However this seems hard to do, because the
"slack space" does not have a start point; on each page, elements
(tuples) grow from the back to the front, while element pointers grow
in the reverse direction. So I don't see how would this be done.

Would it work for an updater, who finds that the locks list (currently
located in the middle of the empty space) is "in the way" of a new tuple
that he wants to insert, to take some kind of lock, move the whole list
up or down (spilling some of these locks to the disk if no more space is
available), and release it again?

Could the lock(s) on a particular tuple be tacked onto the end of that
tuple? (I think tuples are stored variable-width anyway, aren't they?)
I suppose this is conceptually equivalent to adding a variable width
system column which gets TOASTED when it gets too wide. I suppose this
would make taking a lock as expensive as doing an update, though, right?

Paul Tillotson

#6Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Paul Tillotson (#5)

On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote:

Would it work for an updater, who finds that the locks list (currently
located in the middle of the empty space) is "in the way" of a new tuple
that he wants to insert, to take some kind of lock, move the whole list
up or down (spilling some of these locks to the disk if no more space is
available), and release it again?

Well, at that point you need to take a lock in order to be able to
manage locks. Managing not to step on your own feet in that scenario
is complex, to say the least, if not downright impossible.

Another problem with this approach is that it would be practically
impossible for a process to release all its locks when it finishes.

No, we have to build this infrastructure purely using LWLocks, which
means, no normal relations involved.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote:

...

Well, at that point you need to take a lock in order to be able to
manage locks. Managing not to step on your own feet in that scenario
is complex, to say the least, if not downright impossible.

I looked at Paul's first message and thought "nah, that won't work
because ... because ... hmm ... hmmm ..."

We currently store tuple locks on the same page as the tuples (ie, in
the tuple headers) and need no extra locks to do so. Certainly it
still has to have a spill mechanism, but the thought that is attractive
to me is that until you are forced to spill, you do not have to take any
system-wide lock, only a page-level lock. So it could have very good
average performance.

Another problem with this approach is that it would be practically
impossible for a process to release all its locks when it finishes.

There is no explicit release of tuple-level locks in the current setup.
Need it be different in Paul's idea?

regards, tom lane

#8Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#7)

Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote:

...

Well, at that point you need to take a lock in order to be able to
manage locks. Managing not to step on your own feet in that scenario
is complex, to say the least, if not downright impossible.

I looked at Paul's first message and thought "nah, that won't work
because ... because ... hmm ... hmmm ..."

For what it's worth, this would be very similar to how Oracle handles such
locks. In that case there's a fixed amount of space per page reserved in
advance for storing locks.

Actually I think it's a bit more complicated. But that's the idea.

--
greg

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#8)

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

I looked at Paul's first message and thought "nah, that won't work
because ... because ... hmm ... hmmm ..."

For what it's worth, this would be very similar to how Oracle handles such
locks.

[ slightly alarmed ] Do they have a patent on the way they do it?

In that case there's a fixed amount of space per page reserved in
advance for storing locks.

Paul's idea involved using a variable amount of space (ie, whatever's
free) ... which might be sufficient to escape whatever hypothetical
patent Oracle or someone else might hypothetically have on this idea.
Or not. Pardon me for feeling a bit skittish at the moment.

regards, tom lane

#10Paul Tillotson
pntil@shentel.net
In reply to: Greg Stark (#8)

Greg Stark wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote:

...

Well, at that point you need to take a lock in order to be able to
manage locks. Managing not to step on your own feet in that scenario
is complex, to say the least, if not downright impossible.

I looked at Paul's first message and thought "nah, that won't work
because ... because ... hmm ... hmmm ..."

For what it's worth, this would be very similar to how Oracle handles such
locks. In that case there's a fixed amount of space per page reserved in
advance for storing locks.

Actually I think it's a bit more complicated. But that's the idea.

While we're at it, does anyone know how DB2 and MSSQL server handle
row-level locks?

Paul

#11Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#7)

On Fri, Apr 01, 2005 at 11:02:36PM -0500, Tom Lane wrote:

[Cc: to -hackers]

We currently store tuple locks on the same page as the tuples (ie, in
the tuple headers) and need no extra locks to do so. Certainly it
still has to have a spill mechanism, but the thought that is attractive
to me is that until you are forced to spill, you do not have to take any
system-wide lock, only a page-level lock. So it could have very good
average performance.

If we go this way maybe we should abandon the idea of using the standard
lock manager to lock tuples, which is what can be found on the patch I
posted. Or maybe not, and just have the lock manager store the locks on
the page himself -- but it will have to know about the buffer, so it
will be in some sense a break in opacity (of API between the two).

One possible way to do it would be having a OffsetNumber stored in the
page header, and if it's not InvalidOffsetNumber then it points to a
"tuple" that holds

struct
{
OffsetNumber nextLock;
LOCK lock
}

So a locker would check the chain of locks and stop when it sees
InvalidOffsetNumber.

If there is no free space on the page, what should we do? Store the
lock into the main hash table?

Another problem would be the XLog. On heap operations, do we register
exactly where (position in the page) a tuple was stored, or just the
fact that it was stored? If the latter, then there's no problem. If
the former, then on the next REDO the records wouldn't match (==> PANIC)
-- unless we logged the lockings as well.

Reading the code I see we do log the offset numbers, so that's a problem
:-( ... maybe we could work around that by moving the pd_lower without
using line pointers; not sure if that's doable.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Porque Kim no hacia nada, pero, eso s�,
con extraordinario �xito" ("Kim", Kipling)

#12Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#9)

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

I looked at Paul's first message and thought "nah, that won't work
because ... because ... hmm ... hmmm ..."

For what it's worth, this would be very similar to how Oracle handles such
locks.

[ slightly alarmed ] Do they have a patent on the way they do it?

Do we really want to know?...

A few minutes searching on a patent search site doesn't turn up anything
relevant though. It might have been part of Oracle's design from such an early
stage that they never thought it was patentable. It's not clear to me that it
is for that matter. The general idea of storing locks with the objects being
locked isn't really anything novel.

--
greg