pgsql-server: Rearrange pg_subtrans handling as per recent discussion.
Log Message:
-----------
Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again. We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is
refactored a little bit to separate out the necessary decisions.
Modified Files:
--------------
pgsql-server/src/backend/access/transam:
clog.c (r1.22 -> r1.23)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/clog.c.diff?r1=1.22&r2=1.23)
rmgr.c (r1.14 -> r1.15)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/rmgr.c.diff?r1=1.14&r2=1.15)
slru.c (r1.18 -> r1.19)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/slru.c.diff?r1=1.18&r2=1.19)
subtrans.c (r1.2 -> r1.3)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/subtrans.c.diff?r1=1.2&r2=1.3)
xlog.c (r1.162 -> r1.163)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/xlog.c.diff?r1=1.162&r2=1.163)
pgsql-server/src/backend/commands:
vacuum.c (r1.286 -> r1.287)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/vacuum.c.diff?r1=1.286&r2=1.287)
pgsql-server/src/backend/storage/ipc:
sinval.c (r1.69 -> r1.70)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/storage/ipc/sinval.c.diff?r1=1.69&r2=1.70)
pgsql-server/src/backend/storage/lmgr:
lwlock.c (r1.21 -> r1.22)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/storage/lmgr/lwlock.c.diff?r1=1.21&r2=1.22)
pgsql-server/src/include/access:
clog.h (r1.9 -> r1.10)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/access/clog.h.diff?r1=1.9&r2=1.10)
rmgr.h (r1.11 -> r1.12)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/access/rmgr.h.diff?r1=1.11&r2=1.12)
slru.h (r1.7 -> r1.8)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/access/slru.h.diff?r1=1.7&r2=1.8)
subtrans.h (r1.2 -> r1.3)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/access/subtrans.h.diff?r1=1.2&r2=1.3)
pgsql-server/src/include/storage:
lwlock.h (r1.13 -> r1.14)
(http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/storage/lwlock.h.diff?r1=1.13&r2=1.14)
Just random speculation, but could we use a pg_subtrans-like setup to do
row share locks?
ie. Store them in a sort of table to avoid the problems with limited
shared memory size?
Chris
Tom Lane wrote:
Show quoted text
Log Message:
-----------
Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again. We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is
refactored a little bit to separate out the necessary decisions.
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
Just random speculation, but could we use a pg_subtrans-like setup to do
row share locks?
pg_subtrans is trivial to index --- it's a linear array subscripted by
TransactionId. I'm not sure what we'd do to handle row locks, which
would need a key like (DBoid, RELoid, BlockNum, LineNum) and would be
extremely non-dense in that space.
'Tis something to think about though...
regards, tom lane
On Tue, Aug 24, 2004 at 09:17:59AM +0800, Christopher Kings-Lynne wrote:
Just random speculation, but could we use a pg_subtrans-like setup to do
row share locks?ie. Store them in a sort of table to avoid the problems with limited
shared memory size?
Hmm ... how would you map the row number to an array index? I think it
could work if you answer the above question. The problem is doing it
for any possible table/row combination, and make it so that in a given
moment only a small space in the array is used. (If you don't do that,
there will be a lot of thrashing.)
What's needed is a four-dimension integer array:
pg_database oid
pg_class oid
BlockNumber
OffsetNumber
The value would keep how many share-lockers there are.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"C�mo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qu� formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)
pg_subtrans is trivial to index --- it's a linear array subscripted by
TransactionId. I'm not sure what we'd do to handle row locks, which
would need a key like (DBoid, RELoid, BlockNum, LineNum) and would be
extremely non-dense in that space.'Tis something to think about though...
I've been thinking about it and I am wondering what the reason is that
we need to record every transaction that has a row share lock?
ie. why can't we just record the number of locks each backend has, sort
of, and use a reference counting sort of method. Per-backend in case
the backend dies and we need to discount those locks..?
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
ie. why can't we just record the number of locks each backend has, sort
of, and use a reference counting sort of method. Per-backend in case
the backend dies and we need to discount those locks..?
Uh ... the interesting question is usually not "does this backend hold
any row locks", it is "is this row locked by any backends". If the
latter question is not *exceedingly* cheap to answer, at least in the
normal case where the answer is no, you don't have a workable solution,
because you'll be adding nontrivial overhead to every row update.
regards, tom lane
Uh ... the interesting question is usually not "does this backend hold
any row locks", it is "is this row locked by any backends". If the
latter question is not *exceedingly* cheap to answer, at least in the
normal case where the answer is no, you don't have a workable solution,
because you'll be adding nontrivial overhead to every row update.
OK, what I mean is to know if a row is locked by any backend, why can't
we just put a reference count of the number of locks on that row,
instead of recording each backend separately? Wouldn't that require a
fixed amount of shared mem?
Chris
On Tue, 24 Aug 2004, Christopher Kings-Lynne wrote:
Uh ... the interesting question is usually not "does this backend hold
any row locks", it is "is this row locked by any backends". If the
latter question is not *exceedingly* cheap to answer, at least in the
normal case where the answer is no, you don't have a workable solution,
because you'll be adding nontrivial overhead to every row update.OK, what I mean is to know if a row is locked by any backend, why can't
we just put a reference count of the number of locks on that row,
instead of recording each backend separately? Wouldn't that require a
fixed amount of shared mem?
AFAICT you have to do something on top of that to allow deadlock
detection. If transaction X has a shared row lock on A and is waiting on
a lock for me and I want to get an exclusive row lock on A, how do I
detect that it's a deadlock?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
On Tue, 24 Aug 2004, Christopher Kings-Lynne wrote:
OK, what I mean is to know if a row is locked by any backend, why can't
we just put a reference count of the number of locks on that row,
instead of recording each backend separately? Wouldn't that require a
fixed amount of shared mem?
AFAICT you have to do something on top of that to allow deadlock
detection. If transaction X has a shared row lock on A and is waiting on
a lock for me and I want to get an exclusive row lock on A, how do I
detect that it's a deadlock?
I think the speed complaint I was just raising could possibly be
answered by setting an infomask bit indicating that the row might
be present in a separate table of active row locks. (I'm not sure
how the bit would get cleared without race conditions, but let's
suppose that can be done.) A little hashing, a little spill-to-disk
logic, and it might be done. But that's just handwaving... anyone
want to try to fill in the details?
[ But to answer Chris' question: no, I don't see any way that this
could be a fixed-size table. You will need that spill-to-disk bit. ]
regards, tom lane
I think the speed complaint I was just raising could possibly be
answered by setting an infomask bit indicating that the row might
be present in a separate table of active row locks. (I'm not sure
how the bit would get cleared without race conditions, but let's
suppose that can be done.) A little hashing, a little spill-to-disk
logic, and it might be done. But that's just handwaving... anyone
want to try to fill in the details?
I vote Alvaro :) This stuff is way out of my league - I'm just the
ideas man :D
Either way - Bruce, did you want to add a summary of these ideas to the
TODO?
Chris
Christopher Kings-Lynne wrote:
Uh ... the interesting question is usually not "does this backend hold
any row locks", it is "is this row locked by any backends". If the
latter question is not *exceedingly* cheap to answer, at least in the
normal case where the answer is no, you don't have a workable solution,
because you'll be adding nontrivial overhead to every row update.OK, what I mean is to know if a row is locked by any backend, why can't
we just put a reference count of the number of locks on that row,
instead of recording each backend separately? Wouldn't that require a
fixed amount of shared mem?
Don't forget having to deal with a backend dying without being able to
decrement the count (not my idea, Bruce (iirc) mentioned it last time
this was discussed). I think at the least you'd need a
max-trans-id-with-lock number stored next to the count so that in the
event of backend crashes the lock will eventually be released.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
Christopher Kings-Lynne wrote:
Uh ... the interesting question is usually not "does this backend hold
any row locks", it is "is this row locked by any backends". If the
latter question is not *exceedingly* cheap to answer, at least in the
normal case where the answer is no, you don't have a workable solution,
because you'll be adding nontrivial overhead to every row update.OK, what I mean is to know if a row is locked by any backend, why can't
we just put a reference count of the number of locks on that row,
instead of recording each backend separately? Wouldn't that require a
fixed amount of shared mem?Don't forget having to deal with a backend dying without being able to
decrement the count (not my idea, Bruce (iirc) mentioned it last time
this was discussed). I think at the least you'd need a
max-trans-id-with-lock number stored next to the count so that in the
event of backend crashes the lock will eventually be released.
Even more significantly, how does the backend know where to go to
decrement its row counts on commit? I think the best we could do would
be to store the xids on the row for each backend that has a shared lock.
The list could be cleared out automatically by looking in the PROC
structure and removing completed xids. However, that is a lot of
storage for each row to have and a centralized table is probably best
where backends can clean up on their own on commit.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Christopher Kings-Lynne wrote:
I think the speed complaint I was just raising could possibly be
answered by setting an infomask bit indicating that the row might
be present in a separate table of active row locks. (I'm not sure
how the bit would get cleared without race conditions, but let's
suppose that can be done.) A little hashing, a little spill-to-disk
logic, and it might be done. But that's just handwaving... anyone
want to try to fill in the details?I vote Alvaro :) This stuff is way out of my league - I'm just the
ideas man :DEither way - Bruce, did you want to add a summary of these ideas to the
TODO?
OK, TODO updated:
* Implement dirty reads or shared row locks and use them in RI triggers
Adding shared locks requires recording the table/rows numbers in a
shared area, and this could potentially be a large amount of data.
One idea is to store the table/row numbers in a separate table and set
a bit on the row indicating looking in this new table is required to
find any shared row locks.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Richard Huxton <dev@archonet.com> writes:
... Don't forget having to deal with a backend dying without being able to
decrement the count (not my idea, Bruce (iirc) mentioned it last time
this was discussed). I think at the least you'd need a
max-trans-id-with-lock number stored next to the count so that in the
event of backend crashes the lock will eventually be released.
No, because the whole table would simply be flushed upon backend crash.
regards, tom lane
OK, TODO updated:
* Implement dirty reads or shared row locks and use them in RI triggers
Can someone explain to me what a dirty read is and how it relates to RI
triggers?
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
OK, TODO updated:
* Implement dirty reads or shared row locks and use them in RI triggersCan someone explain to me what a dirty read is and how it relates to RI
triggers?
A dirty read is a read that includes data that hasn't been committed yet. Or
as the SQL 92 standard puts it:
1) P1 ("Dirty read"): SQL-transaction T1 modifies a row. SQL-
transaction T2 then reads that row before T1 performs a COMMIT.
If T1 then performs a ROLLBACK, T2 will have read a row that was
never committed and that may thus be considered to have never
existed.
It's only allowed when the transaction is in READ UNCOMMITTED isolation level.
Something Postgres doesn't currently support. In fact I'm not aware of any SQL
database that supports it, though I'm sure there's one somewhere.
You wouldn't normally want to use such a thing, but it could be useful for,
for example, seeing what progress a transaction has made for a UI progress
meter.
It could also be useful for referential integrity checks since, for example,
it would let you see if someone has deleted the referenced record but not
committed the delete yet.
But that alone wouldn't let you avoid locking the record, TODO items are
mostly just pointers to old threads on the mailing lists. They don't contain
the complete story. You could maybe find more information searching the
pgsql-hackers archive on the web site.
--
greg
Greg Stark wrote:
It's only allowed when the transaction is in READ UNCOMMITTED
isolation level.Something Postgres doesn't currently support. In fact I'm not aware of any SQL
database that supports it, though I'm sure there's one somewhere.
FYI - DB2 supports this isolation level, I don't know of any others (tho
Informix is a possibility).
regards
Mark
On Wed, 25 Aug 2004 03:54 pm, Mark Kirkwood wrote:
Greg Stark wrote:
It's only allowed when the transaction is in READ UNCOMMITTED
isolation level.Something Postgres doesn't currently support. In fact I'm not aware of any
SQL database that supports it, though I'm sure there's one somewhere.FYI - DB2 supports this isolation level, I don't know of any others (tho
Informix is a possibility).
Sybase ASE (and by derivation MS SQL Server) also have dirty reads.
Regards, Philip.
On Wed, 25 Aug 2004, Mark Kirkwood wrote:
Greg Stark wrote:
It's only allowed when the transaction is in READ UNCOMMITTED
isolation level.Something Postgres doesn't currently support. In fact I'm not aware of any SQL
database that supports it, though I'm sure there's one somewhere.FYI - DB2 supports this isolation level, I don't know of any others (tho
Informix is a possibility).
I wasn't aware pf DB2 supporting it, but SQL Server and Informix do.
Oracle only supports COMMITTED and SERIALIZABLE according to my 10g
manual.
Gavin
On 8/25/2004 1:32 AM, Greg Stark wrote:
A dirty read is a read that includes data that hasn't been committed yet. Or
as the SQL 92 standard puts it:
[...]
It could also be useful for referential integrity checks since, for example,
it would let you see if someone has deleted the referenced record but not
committed the delete yet.But that alone wouldn't let you avoid locking the record, TODO items are
mostly just pointers to old threads on the mailing lists. They don't contain
the complete story. You could maybe find more information searching the
pgsql-hackers archive on the web site.
Plus ... wouldn't doing the "on delete" lookup as dirty reads let
referencing rows that have been deleted but still could come back
through a rollback disappear? What you want to see are new tuples of
uncommitted insert/update as well as old tuples of uncommitted
delete/update. I don't think there is any term in the standard for that
read mode, so we should call it dusty-reads because they see everything
vacuum is interested in.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #