pgsql: Implement sharable row-level locks, and use them for foreign key

Started by Nonameover 20 years ago3 messages
#1Noname
tgl@svr1.postgresql.org

Log Message:
-----------
Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets. When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX. This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared. Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.

Modified Files:
--------------
pgsql/doc/src/sgml:
mvcc.sgml (r2.49 -> r2.50)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/mvcc.sgml.diff?r1=2.49&r2=2.50)
sql.sgml (r1.35 -> r1.36)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/sql.sgml.diff?r1=1.35&r2=1.36)
storage.sgml (r1.5 -> r1.6)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/storage.sgml.diff?r1=1.5&r2=1.6)
pgsql/doc/src/sgml/ref:
grant.sgml (r1.45 -> r1.46)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/grant.sgml.diff?r1=1.45&r2=1.46)
lock.sgml (r1.46 -> r1.47)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/lock.sgml.diff?r1=1.46&r2=1.47)
pg_resetxlog.sgml (r1.9 -> r1.10)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/pg_resetxlog.sgml.diff?r1=1.9&r2=1.10)
select.sgml (r1.85 -> r1.86)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/select.sgml.diff?r1=1.85&r2=1.86)
select_into.sgml (r1.34 -> r1.35)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/select_into.sgml.diff?r1=1.34&r2=1.35)
pgsql/src/backend/access/heap:
heapam.c (r1.187 -> r1.188)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/heap/heapam.c.diff?r1=1.187&r2=1.188)
pgsql/src/backend/access/transam:
Makefile (r1.19 -> r1.20)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/Makefile.diff?r1=1.19&r2=1.20)
xact.c (r1.199 -> r1.200)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xact.c.diff?r1=1.199&r2=1.200)
xlog.c (r1.188 -> r1.189)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c.diff?r1=1.188&r2=1.189)
pgsql/src/backend/catalog:
index.c (r1.252 -> r1.253)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/index.c.diff?r1=1.252&r2=1.253)
pgsql/src/backend/commands:
portalcmds.c (r1.40 -> r1.41)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/portalcmds.c.diff?r1=1.40&r2=1.41)
trigger.c (r1.186 -> r1.187)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/trigger.c.diff?r1=1.186&r2=1.187)
vacuum.c (r1.306 -> r1.307)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/vacuum.c.diff?r1=1.306&r2=1.307)
pgsql/src/backend/executor:
README (r1.4 -> r1.5)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/README.diff?r1=1.4&r2=1.5)
execMain.c (r1.246 -> r1.247)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/execMain.c.diff?r1=1.246&r2=1.247)
execUtils.c (r1.122 -> r1.123)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/execUtils.c.diff?r1=1.122&r2=1.123)
pgsql/src/backend/nodes:
copyfuncs.c (r1.303 -> r1.304)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/nodes/copyfuncs.c.diff?r1=1.303&r2=1.304)
equalfuncs.c (r1.240 -> r1.241)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/nodes/equalfuncs.c.diff?r1=1.240&r2=1.241)
outfuncs.c (r1.250 -> r1.251)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/nodes/outfuncs.c.diff?r1=1.250&r2=1.251)
readfuncs.c (r1.176 -> r1.177)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/nodes/readfuncs.c.diff?r1=1.176&r2=1.177)
pgsql/src/backend/optimizer/path:
allpaths.c (r1.128 -> r1.129)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/allpaths.c.diff?r1=1.128&r2=1.129)
pgsql/src/backend/optimizer/plan:
initsplan.c (r1.104 -> r1.105)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/initsplan.c.diff?r1=1.104&r2=1.105)
planner.c (r1.184 -> r1.185)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planner.c.diff?r1=1.184&r2=1.185)
pgsql/src/backend/optimizer/prep:
prepjointree.c (r1.26 -> r1.27)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/prep/prepjointree.c.diff?r1=1.26&r2=1.27)
preptlist.c (r1.74 -> r1.75)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/prep/preptlist.c.diff?r1=1.74&r2=1.75)
pgsql/src/backend/parser:
analyze.c (r1.320 -> r1.321)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/analyze.c.diff?r1=1.320&r2=1.321)
gram.y (r2.488 -> r2.489)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/gram.y.diff?r1=2.488&r2=2.489)
parse_relation.c (r1.106 -> r1.107)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/parse_relation.c.diff?r1=1.106&r2=1.107)
parse_type.c (r1.73 -> r1.74)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/parse_type.c.diff?r1=1.73&r2=1.74)
pgsql/src/backend/rewrite:
rewriteHandler.c (r1.150 -> r1.151)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/rewrite/rewriteHandler.c.diff?r1=1.150&r2=1.151)
pgsql/src/backend/storage/ipc:
ipci.c (r1.74 -> r1.75)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/ipc/ipci.c.diff?r1=1.74&r2=1.75)
pgsql/src/backend/storage/lmgr:
lwlock.c (r1.27 -> r1.28)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/lmgr/lwlock.c.diff?r1=1.27&r2=1.28)
pgsql/src/backend/tcop:
utility.c (r1.235 -> r1.236)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tcop/utility.c.diff?r1=1.235&r2=1.236)
pgsql/src/backend/utils/adt:
ri_triggers.c (r1.76 -> r1.77)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/ri_triggers.c.diff?r1=1.76&r2=1.77)
pgsql/src/backend/utils/time:
tqual.c (r1.86 -> r1.87)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/time/tqual.c.diff?r1=1.86&r2=1.87)
pgsql/src/bin/initdb:
initdb.c (r1.81 -> r1.82)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c.diff?r1=1.81&r2=1.82)
pgsql/src/bin/pg_controldata:
pg_controldata.c (r1.22 -> r1.23)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_controldata/pg_controldata.c.diff?r1=1.22&r2=1.23)
pgsql/src/bin/pg_resetxlog:
pg_resetxlog.c (r1.31 -> r1.32)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_resetxlog/pg_resetxlog.c.diff?r1=1.31&r2=1.32)
pgsql/src/include/access:
heapam.h (r1.99 -> r1.100)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/access/heapam.h.diff?r1=1.99&r2=1.100)
htup.h (r1.73 -> r1.74)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/access/htup.h.diff?r1=1.73&r2=1.74)
xlog.h (r1.59 -> r1.60)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h.diff?r1=1.59&r2=1.60)
pgsql/src/include:
c.h (r1.181 -> r1.182)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/c.h.diff?r1=1.181&r2=1.182)
pgsql/src/include/catalog:
pg_control.h (r1.20 -> r1.21)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_control.h.diff?r1=1.20&r2=1.21)
pgsql/src/include/nodes:
execnodes.h (r1.129 -> r1.130)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/execnodes.h.diff?r1=1.129&r2=1.130)
parsenodes.h (r1.277 -> r1.278)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/parsenodes.h.diff?r1=1.277&r2=1.278)
pgsql/src/include/parser:
analyze.h (r1.29 -> r1.30)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/parser/analyze.h.diff?r1=1.29&r2=1.30)
parse_node.h (r1.42 -> r1.43)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/parser/parse_node.h.diff?r1=1.42&r2=1.43)
pgsql/src/include/storage:
bufpage.h (r1.64 -> r1.65)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/bufpage.h.diff?r1=1.64&r2=1.65)
lmgr.h (r1.45 -> r1.46)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/lmgr.h.diff?r1=1.45&r2=1.46)
lwlock.h (r1.17 -> r1.18)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/lwlock.h.diff?r1=1.17&r2=1.18)

Added Files:
-----------
pgsql/src/backend/access/transam:
multixact.c (r1.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/multixact.c?rev=1.1&content-type=text/x-cvsweb-markup)
pgsql/src/include/access:
multixact.h (r1.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/access/multixact.h?rev=1.1&content-type=text/x-cvsweb-markup)

#2Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Implement sharable row-level locks, and use them for foreign key

On Thu, Apr 28, 2005 at 06:47:18PM -0300, Tom Lane wrote:

Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.

One point I didn't quite understand was the business about XLogging
heap_lock_tuple. I had to reread your mail to -hackers on this issue
several times to get it (as you can see I don't fully grok the WAL
rules). Now, I believe that heap_mark4update was wrong on this, no?
Only it didn't matter because after a crash nobody cared about the
stored Xmax.

One nice side effect of this is that the 2PC patch now has this problem
solved. The bad part is that locking a tuple emits an (non-XLogFlushed)
WAL record and it may have a performance impact. (We should have better
performance overall I think, because transactions are no longer locked
on foreign key checking.)

Anyway: many thanks for updating the patch to an usable state. I'm
sorry to have inflicted all those bugs upon you.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La soledad es compa��a"

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: [COMMITTERS] pgsql: Implement sharable row-level locks, and use them for foreign key

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

One point I didn't quite understand was the business about XLogging
heap_lock_tuple. I had to reread your mail to -hackers on this issue
several times to get it (as you can see I don't fully grok the WAL
rules). Now, I believe that heap_mark4update was wrong on this, no?
Only it didn't matter because after a crash nobody cared about the
stored Xmax.

Well, actually the reason I decided to put in xlogging there was that
I realized it was already broken before. In the existing code it was
possible to have this scenario:
* transaction N selects-for-update some tuple, so N goes into
the tuple's XMAX.
* transaction N ends without doing anything else. Since it's
not produced any XLOG entries, xact.c thinks it doesn't need
to emit either a COMMIT or ABORT xlog record.
* therefore, there is no record whatsoever of XID N in XLOG.
* bgwriter pushes the dirty data page to disk.
* database crashes.
* on restart, WAL replay sets the XID counter to N or less,
because there is no evidence in the XLOG for N.
* now there will be a "new" transaction N that is mistakenly
considered to own an update lock on the tuple.

While the negative impact of this situation is probably not high,
it's clearly The Wrong Thing.

The MultiXactId patch introduces a second way to have the same
problem, ie a MultiXactId on disk for which there is no evidence
in XLOG, so the MXID might get re-used after restart.

In view of the fact that we want to do 2PC sometime soon, and that
absolutely requires xlogging every lock, I thought that continuing to
try to avoid emitting an xlog record for heap_lock_tuple was just silly.

regards, tom lane