this is in plain text (row level locks)

Started by Jenny -over 22 years ago24 messageshackers
Jump to latest
#1Jenny -
nat_lazy@hotmail.com

Iam trying to acquire rowlevel locks in postgresql. I try doing this:
'select * from students where name='Larry' for update;
But by looking at the holding array of proclock , I've noticed that by doing
this only
AccessShareLock gets acquired which is a table level lock.
How do I acquire rowlevelock and what fields of Lock or Proclock
datastructures indicate it.
Thanks
Jenny

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jenny - (#1)
Re: this is in plain text (row level locks)

"Jenny -" <nat_lazy@hotmail.com> writes:

Iam trying to acquire rowlevel locks in postgresql. I try doing this:
'select * from students where name='Larry' for update;
But by looking at the holding array of proclock , I've noticed that by doing
this only
AccessShareLock gets acquired which is a table level lock.

Row-level locks are not recorded in proclock --- they are implemented by
marking the individual tuple on-disk. If we tried to record them in
shared memory, it'd be very easy to run out of shared memory, because
you could be holding row locks on a large number of tuples.

regards, tom lane

#3Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#2)
Re: this is in plain text (row level locks)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> "Jenny -" <nat_lazy@hotmail.com> writes:

Iam trying to acquire rowlevel locks in postgresql. I try doing
this: 'select * from students where name='Larry' for update;
But by looking at the holding array of proclock , I've noticed
that by doing this only AccessShareLock gets acquired which is
a table level lock.

Tom> Row-level locks are not recorded in proclock --- they are
Tom> implemented by marking the individual tuple on-disk. If we
Tom> tried to record them in shared memory, it'd be very easy to
Tom> run out of shared memory, because you could be holding row
Tom> locks on a large number of tuples.

Of course, other database systems do this without too much hassle
.. including relying on lock escalation (move up to page/table level
locks) when the number of locks grow too large.

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ?

Not that I dislike the idea - Toby Lehman suggested this in his
Ph.D. thesis in the mid-eighties for main-memory databases (where you
don't take the write penalty).

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#4Bruce Momjian
bruce@momjian.us
In reply to: Sailesh Krishnamurthy (#3)
Re: this is in plain text (row level locks)

Sailesh Krishnamurthy wrote:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> "Jenny -" <nat_lazy@hotmail.com> writes:

Iam trying to acquire rowlevel locks in postgresql. I try doing
this: 'select * from students where name='Larry' for update;
But by looking at the holding array of proclock , I've noticed
that by doing this only AccessShareLock gets acquired which is
a table level lock.

Tom> Row-level locks are not recorded in proclock --- they are
Tom> implemented by marking the individual tuple on-disk. If we
Tom> tried to record them in shared memory, it'd be very easy to
Tom> run out of shared memory, because you could be holding row
Tom> locks on a large number of tuples.

Of course, other database systems do this without too much hassle
.. including relying on lock escalation (move up to page/table level
locks) when the number of locks grow too large.

I wouldn't say they do it with little hassle --- it is quite a pain, in
fact, at least for users.

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ?

We don't need to shared lock individual rows because of MVCC --- well,
we sort of do by recording our xid in our proc structure, so folks don't
change things underneath us. We prevent expired rows from disappearing
from the disk by others looking at our proc start xid.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: this is in plain text (row level locks)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Sailesh Krishnamurthy wrote:

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ?

We don't need to shared lock individual rows because of MVCC --- well,
we sort of do by recording our xid in our proc structure, so folks don't
change things underneath us. We prevent expired rows from disappearing
from the disk by others looking at our proc start xid.

This is actually an issue though. Row-level shared locks would be
really nice to have for foreign-key handling. Right now we have to
use X locks for those, and that leads to deadlocking problems for
applications.

regards, tom lane

#6Mike Mascari
mascarm@mascari.com
In reply to: Tom Lane (#5)
Re: this is in plain text (row level locks)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Sailesh Krishnamurthy wrote:

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ?

We don't need to shared lock individual rows because of MVCC --- well,
we sort of do by recording our xid in our proc structure, so folks don't
change things underneath us. We prevent expired rows from disappearing
from the disk by others looking at our proc start xid.

This is actually an issue though. Row-level shared locks would be
really nice to have for foreign-key handling. Right now we have to
use X locks for those, and that leads to deadlocking problems for
applications.

Yes! Yes! It's the last big hurdle for an otherwise excellent RI
implementation...

Just wanted "Joe-user's" enthusiasm for row-level S locks registered
somewhere... :-)

Mike Mascari
mascarm@mascari.com

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: this is in plain text (row level locks)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Sailesh Krishnamurthy wrote:

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ?

We don't need to shared lock individual rows because of MVCC --- well,
we sort of do by recording our xid in our proc structure, so folks don't
change things underneath us. We prevent expired rows from disappearing
from the disk by others looking at our proc start xid.

This is actually an issue though. Row-level shared locks would be
really nice to have for foreign-key handling. Right now we have to
use X locks for those, and that leads to deadlocking problems for
applications.

Is the plan to allow one backend to shared lock the row while others can
read it but not modify it, or is the idea to actually allow multiple
backends to record their shared status on the row?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: this is in plain text (row level locks)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

This is actually an issue though. Row-level shared locks would be
really nice to have for foreign-key handling. Right now we have to
use X locks for those, and that leads to deadlocking problems for
applications.

Is the plan to allow one backend to shared lock the row while others can
read it but not modify it, or is the idea to actually allow multiple
backends to record their shared status on the row?

Plan? We have no plan to fix this :-(. But clearly there has to be
some way to tell which backends hold read locks on a shared-locked row,
else you can't tell if they've all dropped the lock or not.

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: this is in plain text (row level locks)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

This is actually an issue though. Row-level shared locks would be
really nice to have for foreign-key handling. Right now we have to
use X locks for those, and that leads to deadlocking problems for
applications.

Is the plan to allow one backend to shared lock the row while others can
read it but not modify it, or is the idea to actually allow multiple
backends to record their shared status on the row?

Plan? We have no plan to fix this :-(. But clearly there has to be
some way to tell which backends hold read locks on a shared-locked row,
else you can't tell if they've all dropped the lock or not.

I suppose we could allow one backend to mark the page with a shared lock
for primary key purposes while others read it. Does that buy us
anything?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: this is in plain text (row level locks)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I suppose we could allow one backend to mark the page with a shared lock
for primary key purposes while others read it. Does that buy us
anything?

That doesn't work, unless you insist that the first backend can't exit
its transaction until all the other ones are done. Which introduces its
own possibilities for deadlock --- but even worse, how does the first
backend *know* that the other ones are done? You're right back where
you started: it has to be possible to tell which backends have
share-locked a particular row.

regards, tom lane

#11Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#10)
Re: this is in plain text (row level locks)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> That doesn't work, unless you insist that the first backend
Tom> can't exit its transaction until all the other ones are done.
Tom> Which introduces its own possibilities for deadlock --- but
Tom> even worse, how does the first backend *know* that the other
Tom> ones are done? You're right back where you started: it has
Tom> to be possible to tell which backends have share-locked a
Tom> particular row.

Is a count a solution ?

The first backend gets the S lock on the row - I'm assuming you plan
to do it by recording it on the tuple and not in a shared memory lock
table, which means that you might have to unnecessarily write an
unmodified page if its buffer pool frame is stolen.

The problem is that on commit time, you must carefully decrement the
count value of shared locks on any tuple that you own. This can be
accomplished by having each backend keep track of the list of files
and TIDs for any rows for which it acquired S locks. Is this the same
way that pgsql releases the X locks ?

Bruce, I don't disagree that MVCC has the very nice property that
writers don't block readers. However, I don't buy that 2-phase
locking, with lock escalation is either unworkable because of too many
locks, or causes any extra pain for the user application (apart from
the fact that writers not blocking readers gives you more concurrency
at some very minor overhead of not being strictly serializable).

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#12Rod Taylor
rbt@rbt.ca
In reply to: Sailesh Krishnamurthy (#11)
Re: this is in plain text (row level locks)

On Thu, 2003-07-24 at 02:45, Sailesh Krishnamurthy wrote:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> That doesn't work, unless you insist that the first backend
Tom> can't exit its transaction until all the other ones are done.
Tom> Which introduces its own possibilities for deadlock --- but
Tom> even worse, how does the first backend *know* that the other
Tom> ones are done? You're right back where you started: it has
Tom> to be possible to tell which backends have share-locked a
Tom> particular row.

Is a count a solution ?

Almost. Problem with a count is the difficulty decrementing the count
after a crash. You would also need to store a log of some kind so you
know what to reset while starting up in error recovery mode which would
add a large amount of overhead.

It may be best to have a locking manager run as a separate process.
That way it could store locks in ram or spill over to disk. Now the
problem is communication overhead. It probably isn't any worse than
writing them to disk for long queries (actual disk activity resulting),
but short queries are probably going to notice.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rod Taylor (#12)
Re: this is in plain text (row level locks)

Rod Taylor <rbt@rbt.ca> writes:

It may be best to have a locking manager run as a separate process.
That way it could store locks in ram or spill over to disk.

Hmm, that might be workable. We could imagine that in place of the
HEAP_MARKED_FOR_UPDATE status bit, we have a "this row is possibly
locked" hint bit. Only if you see the bit set do you need to query
the lock manager. If the answer comes back that no lock is held,
you can clear the bit --- so no need for any painful "undo" stuff
after a crash, and no communication overhead in the normal case.

regards, tom lane

#14Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#13)
Re: this is in plain text (row level locks)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Rod Taylor <rbt@rbt.ca> writes:

It may be best to have a locking manager run as a separate
process. That way it could store locks in ram or spill over to
disk.

Tom> Hmm, that might be workable. We could imagine that in place
Tom> of the HEAP_MARKED_FOR_UPDATE status bit, we have a "this row
Tom> is possibly locked" hint bit. Only if you see the bit set do
Tom> you need to query the lock manager. If the answer comes back

Why do you want to query the lock manager as a separate process ?

Why not have the traditional approach of a lock table in shared
memory, growing and shrinking as appropriate, and have each individual
process update it (need to protect it with a latch of course).

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sailesh Krishnamurthy (#14)
Re: this is in plain text (row level locks)

Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:

Why not have the traditional approach of a lock table in shared
memory, growing and shrinking as appropriate,

Because we can't grow shared memory. Whatever size we get at startup is
what we're stuck with. (I suppose we could try asking the kernel for
additional segments, but there's every likelihood that that will fail.)

regards, tom lane

#16Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#15)
Re: this is in plain text (row level locks)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:

Why not have the traditional approach of a lock table in shared
memory, growing and shrinking as appropriate,

Tom> Because we can't grow shared memory. Whatever size we get at
Tom> startup is what we're stuck with. (I suppose we could try
Tom> asking the kernel for additional segments, but there's every
Tom> likelihood that that will fail.)

We implemented a Shared Memory MemoryContext using OSSP libmm (used in
Apache) for TelegraphCQ.

If you think it's useful I can submit it as a patch.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#17Bruce Momjian
bruce@momjian.us
In reply to: Sailesh Krishnamurthy (#16)
Re: this is in plain text (row level locks)

Sailesh Krishnamurthy wrote:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:

Why not have the traditional approach of a lock table in shared
memory, growing and shrinking as appropriate,

Tom> Because we can't grow shared memory. Whatever size we get at
Tom> startup is what we're stuck with. (I suppose we could try
Tom> asking the kernel for additional segments, but there's every
Tom> likelihood that that will fail.)

We implemented a Shared Memory MemoryContext using OSSP libmm (used in
Apache) for TelegraphCQ.

If you think it's useful I can submit it as a patch.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

^^^^^^^^^^^^^^^
Watch out, that code from Berkeley usually is a mess. :-)

Seriously, though, it would be good to see what you guys are up to.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sailesh Krishnamurthy (#16)
Re: this is in plain text (row level locks)

Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:

We implemented a Shared Memory MemoryContext using OSSP libmm (used in
Apache) for TelegraphCQ.

How portable is that? What kind of real-world performance do you get?

regards, tom lane

#19Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#18)
Re: this is in plain text (row level locks)

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:

We implemented a Shared Memory MemoryContext using OSSP libmm
(used in Apache) for TelegraphCQ.

Tom> How portable is that? What kind of real-world performance do
Tom> you get?

As portable as libmm I guess. We've only tried it on Linux and MacOS
X. I understand that the Apache guys use it - so it should be fairly
portable.

http://www.ossp.org/pkg/lib/mm/

Actually I've also tried it successfully on SPARC Solaris.

I believe that a user of our first TelegraphCQ release (that we did
not announce) tried it on FreeBSD. Sean Chittenden has tried it on
FreeBSD but I'm not sure if he built it and got it to work properly.

As for our SHMemoryContext code itself, it's pretty much a straight
copy of the code in aset.c and mctx.c, with appropriate changes where
we call the mm_alloc functions and also use mm_lock and mm_unlock for
protection.

As for performance, I'm in the midst of a performance study, but it's
not a micro-benchmark of the shmem allocator. After I get some numbers
I'll start profiling our code.

One problem is that on some platforms the locking implementation uses
the file-system. This is not in our control and we just trust mm to do
the best possible.

Given that we rely on shared query execution (we run multiple queries
in a single adaptive query plan) we don't have much choice - we must
rely on shared memory.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#20Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Bruce Momjian (#17)
Re: this is in plain text (row level locks)

"Bruce" == Bruce Momjian <pgman@candle.pha.pa.us> writes:

-- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh

Bruce> ^^^^^^^^^^^^^^^ Watch out, that code from
Bruce> Berkeley usually is a mess. :-)

LOL !

That's why we release the code - in the hope of getting real hackers
playing with it :-)

Bruce> Seriously, though, it would be good to see what you guys
Bruce> are up to.

Our code is based on 7.3.2 pgsql and available at:

http://telegraph.cs.berkeley.edu/telegraphcq

If the specific shm stuff is interesting, I can easily take it out and
submit it. If you prefer to just look at the code in the tarball
that's fine too.

Or you can browse our cvs repository at:

http://triplerock.cs.berkeley.edu:8080/viewcvs/viewcvs.cgi/

Look for shmctx.c and shmset.c in src/backend/utils/mmgr

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
#24Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#22)