SIREAD lock versus ACCESS EXCLUSIVE lock

Started by Kevin Grittnerover 14 years ago47 messages
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

Composing my rather long-winded response to Simon got me thinking --
which just led me to realize there is probably a need to fix another
thing related to SSI.

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS EXCLUSIVE
lock might need to block until all SIREAD locks on the relation have
been released. Picture, for example, what might happen if one
transaction acquires some predicate locks, then commits (releasing
its heavyweight lock on the table), and before concurrent READ WRITE
transactions complete there is a CLUSTER on the table. Or a DROP
INDEX. :-(

Both require an ACCESS EXCLUSIVE lock. Since an active transaction
would already have an ACCESS SHARE lock when acquiring the SIREAD
locks, this couldn't block in the other direction or with an active
transaction. That means that it couldn't cause any deadlocks if we
added blocking to the acquisition of an ACCESS EXCLUSIVE based on
this.

If we don't do this I don't think that there is a more serious
impact than inaccurate conflict detection for serializable
transactions which are active when these operations are performed.
Well, that and the possibility of seeing SIRead locks in the
pg_locks view for indexes or tables which no longer exist. So far I
don't see any crash modes or effects on non-serializable
transactions. If this change is too destabilizing for this point in
the release we could document it as a limitation and fix it in 9.2.

We're pretty aggressive about cleaning up SIREAD locks as soon as
allowable after transaction completion, so this probably wouldn't
happen very often.

-Kevin

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#1)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 27.04.2011 22:59, Kevin Grittner wrote:

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS EXCLUSIVE
lock might need to block until all SIREAD locks on the relation have
been released. Picture, for example, what might happen if one
transaction acquires some predicate locks, then commits (releasing
its heavyweight lock on the table), and before concurrent READ WRITE
transactions complete there is a CLUSTER on the table. Or a DROP
INDEX. :-(

Hmm, could we upgrade all predicate locks to relation-level predicate
locks instead?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#2)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 27.04.2011 22:59, Kevin Grittner wrote:

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS
EXCLUSIVE lock might need to block until all SIREAD locks on the
relation have been released. Picture, for example, what might
happen if one transaction acquires some predicate locks, then
commits (releasing its heavyweight lock on the table), and before
concurrent READ WRITE transactions complete there is a CLUSTER on
the table. Or a DROP INDEX. :-(

Hmm, could we upgrade all predicate locks to relation-level
predicate locks instead?

Tied to what backend? This only comes into play with transactions
which have committed and are not yet able to release their predicate
locks because of overlapping READ WRITE serializable transactions
which are still active. Until that point the ACCESS SHARE lock (or
stronger) is still there protecting against this.

One way we could push this entirely into the heavyweight locking
system would be for a committing transaction with predicate locks to
somehow cause all overlapping read write serializable transactions
which are still active to acquire an ACCESS SHARE lock on each
relation for which the committing transaction has any predicate
lock(s). (Unless of course they already hold some lock on a
relevant relation.) This would need to be done before the
committing transaction released its lock on the relation (which it
must have acquired to read something which would cause it to have a
predicate lock). Is that feasible? It seems like a lot of work,
and I'm not sure how one transaction would convince another
process's transaction to acquire the locks.

As an alternative, perhaps we could find a way to leave the relation
locks for a serializable transaction until it's safe to clean up the
predicate locks? They could be downgraded to ACCESS SHARE if they
are stronger. They would need to survive beyond not only the commit
of the transaction, but also the termination of the connection.
They would need to be immune to being chosen as deadlock victims.

Any other ideas?

-Kevin

#4Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#3)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On Wed, Apr 27, 2011 at 02:59:19PM -0500, Kevin Grittner wrote:

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS EXCLUSIVE
lock might need to block until all SIREAD locks on the relation have
been released. Picture, for example, what might happen if one

I think you're correct about this, but also agree that it would be
reasonable to document the limitation for now and punt on a fix until
9.2.

On Wed, Apr 27, 2011 at 04:09:38PM -0500, Kevin Grittner wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Hmm, could we upgrade all predicate locks to relation-level
predicate locks instead?

Tied to what backend? This only comes into play with transactions
which have committed and are not yet able to release their predicate
locks because of overlapping READ WRITE serializable transactions
which are still active. Until that point the ACCESS SHARE lock (or
stronger) is still there protecting against this.

I think Heikki was suggesting to upgrade to relation-level SIREAD
locks. This seems like it would work, but might cause a lot of aborts
immediately afterwards. I do wonder if it might be necessary to upgrade
index locks to relation-level locks on the heap relation, in cases of
dropping the index.

One way we could push this entirely into the heavyweight locking
system would be for a committing transaction with predicate locks to
somehow cause all overlapping read write serializable transactions
which are still active to acquire an ACCESS SHARE lock on each
relation for which the committing transaction has any predicate
lock(s).

I'm not entirely clear on how this would work, but it sounds like it
could also have a significant performance cost.

As an alternative, perhaps we could find a way to leave the relation
locks for a serializable transaction until it's safe to clean up the
predicate locks? They could be downgraded to ACCESS SHARE if they
are stronger. They would need to survive beyond not only the commit
of the transaction, but also the termination of the connection.
They would need to be immune to being chosen as deadlock victims.

This sounds like it would require major changes to the heavyweight lock
manager. There's already a notion of keeping locks past backend
termination for 2PC prepared transactions, but it would be hard to
apply here.

The most practical solutions I see here are to, when acquiring an
AccessExclusive lock, either wait for any associated SIREAD locks to go
away, or to promote them to relation level.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#4)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Dan Ports <drkp@csail.mit.edu> wrote:

On Wed, Apr 27, 2011 at 04:09:38PM -0500, Kevin Grittner wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Hmm, could we upgrade all predicate locks to relation-level
predicate locks instead?

Tied to what backend?

I think Heikki was suggesting to upgrade to relation-level SIREAD
locks.

Oh, yeah. That *is* what he said, isn't it. That's a simpler
solution which works just fine. Please forget my over-excited
flights of fancy on the topic -- I was so focused on what I thought
was the solution I misread Heikki's much better idea as some
variation on my theme. :-(

-Kevin

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#1)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On Wed, Apr 27, 2011 at 8:59 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS EXCLUSIVE
lock might need to block until all SIREAD locks on the relation have
been released.  Picture, for example, what might happen if one
transaction acquires some predicate locks, then commits (releasing
its heavyweight lock on the table), and before concurrent READ WRITE
transactions complete there is a CLUSTER on the table. Or a DROP
INDEX.  :-(

Sorry, I can't picture it. What will happen?

Both require an ACCESS EXCLUSIVE lock.  Since an active transaction
would already have an ACCESS SHARE lock when acquiring the SIREAD
locks, this couldn't block in the other direction or with an active
transaction.  That means that it couldn't cause any deadlocks if we
added blocking to the acquisition of an ACCESS EXCLUSIVE based on
this.

If we don't do this I don't think that there is a more serious
impact than inaccurate conflict detection for serializable
transactions which are active when these operations are performed.
Well, that and the possibility of seeing SIRead locks in the
pg_locks view for indexes or tables which no longer exist.  So far I
don't see any crash modes or effects on non-serializable
transactions.  If this change is too destabilizing for this point in
the release we could document it as a limitation and fix it in 9.2.

I don't think this should wait for 9.2

It either works, or it doesn't. Putting caveats in there will just
detract from people's belief in it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#6)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Simon Riggs <simon@2ndQuadrant.com> wrote:

On Wed, Apr 27, 2011 at 8:59 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

For correct serializable behavior in the face of concurrent DDL
execution, I think that a request for a heavyweight ACCESS
EXCLUSIVE lock might need to block until all SIREAD locks on the
relation have been released. Picture, for example, what might
happen if one transaction acquires some predicate locks, then
commits (releasing its heavyweight lock on the table), and before
concurrent READ WRITE transactions complete there is a CLUSTER on
the table. Or a DROP INDEX. :-(

Sorry, I can't picture it. What will happen?

Rather than get into a complex generalized discussion, I'll provide
the simplest example I can picture.

Let's say we have two concurrent transactions, T0 and T1. Up to
this point T0 has read from table x and written to table y based on
what was read from x. T1 has read from y -- but since the
transactions are concurrent, it doesn't see T0's write. Let's
assume each read was of a single tuple accessed through a btree
index, so each transaction has one tuple lock on the heap and one
page lock on the index. Now T0 commits. T0 must hold its SIREAD
locks because of concurrent transaction T1. Everything is fine so
far. Now a DBA runs CLUSTER against table x. The SIREAD locks held
by T0 are probably now wrong, because the tuple and its index entry
are likely to have moved. Now T1 writes to table x based on what it
read from y. It could incorrectly detect a conflict if it happens
to write to a tuple at the locked block and tuple number when it's
not the same row. Worse, it could miss detecting a conflict if it's
really updating the same row that T0 wrote, and that's not detected
because it's not at the locked location any more.

If this change is too destabilizing for this point in the release
we could document it as a limitation and fix it in 9.2.

I don't think this should wait for 9.2

It either works, or it doesn't. Putting caveats in there will just
detract from people's belief in it.

I see your point. And this clearly is a bug. We failed to consider
this category of problem and cover it.

Heikki's suggestion is clearly the best plan. In the example above,
when the CLUSTER was run it would make a call to the predicate
locking module telling it to promote all SIREAD locks for table x or
any of its indexes into a relation level lock on table x. The
CLUSTER would cause us to lose the finer granularity of the locks on
the table, and in this example if T1 wrote to table x it be rolled
back with a serialization failure. This could be a false positive,
but we expect to have some of those -- the transaction is retried
and then succeeds. You can't have a false negative, so integrity is
preserved.

I'll try to work up a detailed plan of which commands need what
actions. For example, DROP INDEX needs to promote SIREAD locks on
the dropped index to relation locks on the related table. TRUNCATE
TABLE is a little confusing -- I think that if it's run in a
serializable transaction we generate a rw-conflict out from that
transaction to every transaction holding any SIREAD lock on that
table or any of its indexes, and then clear those SIREAD locks.
This'll take some study.

-Kevin

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#7)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

This'll take some study.

I've gone through the list of commands in the development docs with
an eye toward exposing anything else we might have missed in dealing
with the SSI predicate locking. Some of this needs further
research, but I'm posting what I have so far so that people have a
chance to argue about what I think needs doing or to point out
anything else they can think of which I'm still missing. And, of
course, so people know where we are with it.

I tested file_fdw at the serializable isolation level and found that
no predicate locks were taken and no problems showed up. I think
that's all correct -- I don't see how we can expect to include FDW
data in serializability.

I haven't dug into ALTER INDEX enough to know whether it can ever
cause an index to be rebuilt. If so, we need to treat it like DROP
INDEX and REINDEX -- which should change all predicate locks of any
granularity on the index into relation locks on the associated
table.

CLUSTER or an ALTER TABLE which causes a rewrite should change all
predicate locks on the table and all indexes into relation locks on
the associated table. (Obviously, an existing relation lock on the
table doesn't require any action.)

TRUNCATE TABLE and DROP TABLE should generate a rw-conflict *in* to
the enclosing transaction (if it is serializable) from all
transactions holding predicate locks on the table or its indexes.
Note that this could cause a transactions which is running one of
these statements to roll back with a serialization error. This seems
correct to me, since these operations essentially delete all rows.
If you don't want the potential rollback, these operations should be
run at another isolation level. The difference between these two
statements is that I think that TRUNCATE TABLE should also move the
existing predicate locks to relation locks on the table while DROP
TABLE (for obvious reasons) should just delete the predicate locks.

DROP DATABASE should quietly clean up any predicate locks from
committed transactions which haven't yet hit their cleanup point
because of overlapping transactions in other databases.

DROP OWNED or DROP SCHEMA could CASCADE to some of the above, in
which case the above rules would apply.

I need to do some testing with Large Objects to see what state those
are in with SSI, unless someone else has already looked at that.

SAVEPOINTs have been considered and there is a lot of coding to try
to handle them correctly, but they probably merit a bit more
attention and testing. On some quick testing everything seemed to
be in line with previous discussions and with what seems logical to
me, but our SSI regression tests in src/test/isolation lack any
coverage for them and I don't know how much others have beat up on
them. At a minimum we should add a couple tests.

I couldn't find anything else which hadn't already been considered
and covered.

More to come as I finish investigating.

-Kevin

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#8)
1 attachment(s)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

I haven't dug into ALTER INDEX enough to know whether it can ever
cause an index to be rebuilt. If so, we need to treat it like
DROP INDEX and REINDEX -- which should change all predicate locks
of any granularity on the index into relation locks on the
associated table.

CLUSTER or an ALTER TABLE which causes a rewrite should change all
predicate locks on the table and all indexes into relation locks
on the associated table. (Obviously, an existing relation lock on
the table doesn't require any action.)

TRUNCATE TABLE and DROP TABLE should generate a rw-conflict *in*
to the enclosing transaction (if it is serializable) from all
transactions holding predicate locks on the table or its indexes.
Note that this could cause a transactions which is running one of
these statements to roll back with a serialization error. This
seems correct to me, since these operations essentially delete all
rows. If you don't want the potential rollback, these operations
should be run at another isolation level. The difference between
these two statements is that I think that TRUNCATE TABLE should
also move the existing predicate locks to relation locks on the
table while DROP TABLE (for obvious reasons) should just delete
the predicate locks.

DROP DATABASE should quietly clean up any predicate locks from
committed transactions which haven't yet hit their cleanup point
because of overlapping transactions in other databases.

I missed VACUUM FULL when pulling together the above, but I haven't
found any other omissions. (Still happy to hear about any that
anyone can spot.)

I notice that most of these need to shift transfer locks to relation
locks on the heap, either for a single index or for the heap and all
indexes. I wrote a function to do this and called it from one place
to be able to test it. Consider this a WIP patch on which I would
appreciate review while I work on finding the other places to call
it and the miscellaneous other fixes needed.

Note that I had to expose one previously-static function from
index.c to find the heap OID from the index OID. Also, I ran
pgindent against predicate.c, as I generally like to do when I
modify much code, and it found four comment blocks in predicate.c
touched since the recent global pgindent run which it re-wrapped.
I can manually exclude those from the final patch if people would
prefer that; but if people can ignore those whitespace tweaks, it
might not be all bad to get standard formatting onto them at this
point.

-Kevin

Attachments:

ssi-ddl-1.patchtext/plain; name=ssi-ddl-1.patchDownload
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 54,59 ****
--- 54,60 ----
  #include "parser/parser.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
***************
*** 114,120 **** static void validate_index_heapscan(Relation heapRelation,
  						IndexInfo *indexInfo,
  						Snapshot snapshot,
  						v_i_state *state);
- static Oid	IndexGetRelation(Oid indexId);
  static void SetReindexProcessing(Oid heapOid, Oid indexOid);
  static void ResetReindexProcessing(void);
  static void SetReindexPending(List *indexes);
--- 115,120 ----
***************
*** 2721,2727 **** validate_index_heapscan(Relation heapRelation,
   * IndexGetRelation: given an index's relation OID, get the OID of the
   * relation it is an index on.	Uses the system cache.
   */
! static Oid
  IndexGetRelation(Oid indexId)
  {
  	HeapTuple	tuple;
--- 2721,2727 ----
   * IndexGetRelation: given an index's relation OID, get the OID of the
   * relation it is an index on.	Uses the system cache.
   */
! Oid
  IndexGetRelation(Oid indexId)
  {
  	HeapTuple	tuple;
***************
*** 2782,2787 **** reindex_index(Oid indexId, bool skip_constraint_checks)
--- 2782,2793 ----
  	 */
  	CheckTableNotInUse(iRel, "REINDEX INDEX");
  
+ 	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(iRel);
+ 
  	PG_TRY();
  	{
  		/* Suppress use of the target index while rebuilding it */
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 185,190 ****
--- 185,191 ----
  #include "access/twophase.h"
  #include "access/twophase_rmgr.h"
  #include "access/xact.h"
+ #include "catalog/index.h"
  #include "miscadmin.h"
  #include "storage/bufmgr.h"
  #include "storage/predicate.h"
***************
*** 975,982 **** InitPredicateLocks(void)
  	bool		found;
  
  	/*
! 	 * Compute size of predicate lock target hashtable.
! 	 * Note these calculations must agree with PredicateLockShmemSize!
  	 */
  	max_table_size = NPREDICATELOCKTARGETENTS();
  
--- 976,983 ----
  	bool		found;
  
  	/*
! 	 * Compute size of predicate lock target hashtable. Note these
! 	 * calculations must agree with PredicateLockShmemSize!
  	 */
  	max_table_size = NPREDICATELOCKTARGETENTS();
  
***************
*** 1028,1035 **** InitPredicateLocks(void)
  									  hash_flags);
  
  	/*
! 	 * Compute size for serializable transaction hashtable.
! 	 * Note these calculations must agree with PredicateLockShmemSize!
  	 */
  	max_table_size = (MaxBackends + max_prepared_xacts);
  
--- 1029,1036 ----
  									  hash_flags);
  
  	/*
! 	 * Compute size for serializable transaction hashtable. Note these
! 	 * calculations must agree with PredicateLockShmemSize!
  	 */
  	max_table_size = (MaxBackends + max_prepared_xacts);
  
***************
*** 2276,2288 **** PredicateLockTupleRowVersionLink(const Relation relation,
  				newxmin;
  
  	/*
! 	 * Bail out quickly if there are no serializable transactions
! 	 * running.
  	 *
! 	 * It's safe to do this check without taking any additional
! 	 * locks. Even if a serializable transaction starts concurrently,
! 	 * we know it can't take any SIREAD locks on the modified tuple
! 	 * because the caller is holding the associated buffer page lock.
  	 */
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
--- 2277,2288 ----
  				newxmin;
  
  	/*
! 	 * Bail out quickly if there are no serializable transactions running.
  	 *
! 	 * It's safe to do this check without taking any additional locks. Even if
! 	 * a serializable transaction starts concurrently, we know it can't take
! 	 * any SIREAD locks on the modified tuple because the caller is holding
! 	 * the associated buffer page lock.
  	 */
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
***************
*** 2622,2627 **** exit:
--- 2622,2839 ----
  	return !outOfShmem;
  }
  
+ /*
+  * For all connections, transfer all predicate locks for the given relation
+  * to a single relation lock on the heap.  For a heap relation that includes
+  * all locks on indexes; for an index it is just the locks on that one
+  * index.
+  *
+  * This requires grabbing a lot of LW locks and scanning the entire lock
+  * target table for matches.  That makes this more expensive than most
+  * functions, but it will only be called for DDL type commands and there are
+  * fast returns when no serializable transactions are active or the relation
+  * is temporary.
+  *
+  * We are not using the existing TransferPredicateLocksToNewTarget because
+  * it acquires its own locks on the partitions of the two targets invovled,
+  * and we'll already be holding all partition locks.
+  *
+  * We can't throw an error from here, because the call could be from a
+  * transaction which is not serializable.
+  */
+ void
+ TransferPredicateLocksToHeapRelation(const Relation relation)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *oldtarget;
+ 	PREDICATELOCKTARGET *heaptarget;
+ 	PREDICATELOCKTARGETTAG heaptargettag;
+ 	PREDICATELOCKTAG newpredlocktag;
+ 	Oid			dbId;
+ 	Oid			indexId;
+ 	Oid			heapId;
+ 	int			i;
+ 	bool		isSingleIndex;
+ 	bool		found;
+ 	uint32		reservedtargettaghash;
+ 	uint32		heaptargettaghash;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running. As
+ 	 * with PredicateLockTupleRowVersionLink, it's safe to check this without
+ 	 * taking locks because the caller is holding the buffer page lock.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSplitTracking(relation))
+ 		return;
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	if (relation->rd_index == NULL)
+ 	{
+ 		isSingleIndex = false;
+ 		indexId = InvalidOid;	/* quiet compiler warning */
+ 		heapId = relation->rd_id;
+ 	}
+ 	else
+ 	{
+ 		isSingleIndex = true;
+ 		indexId = relation->rd_id;
+ 		heapId = relation->rd_index->indrelid;
+ 	}
+ 	Assert(heapId != InvalidOid);
+ 	SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag,
+ 										dbId,
+ 										heapId);
+ 	heaptargettaghash = PredicateLockTargetTagHashCode(&heaptargettag);
+ 	heaptarget = NULL;			/* Retrieve first time needed, then keep. */
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_EXCLUSIVE);
+ 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ 
+ 	/*
+ 	 * Remove the reserved entry to give us scratch space, so we know we'll be
+ 	 * able to create the new lock target.
+ 	 */
+ 	reservedtargettaghash = PredicateLockTargetTagHashCode(&ReservedTargetTag);
+ 	hash_search_with_hash_value(PredicateLockTargetHash,
+ 								&ReservedTargetTag,
+ 								reservedtargettaghash,
+ 								HASH_REMOVE, &found);
+ 	Assert(found);
+ 
+ 	/* Scan through PredicateLockHash and copy contents */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *oldpredlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 			continue;			/* wrong database */
+ 		if (isSingleIndex)
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) != indexId)
+ 				continue;		/* not the index we're looking for */
+ 		}
+ 		else
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) == heapId)
+ 			{
+ 				if (GET_PREDICATELOCKTARGETTAG_TYPE(oldtarget->tag) == PREDLOCKTAG_RELATION)
+ 					continue;	/* already the right lock */
+ 			}
+ 			else
+ 			{
+ 				/* This is an index.  Is it for the right heap? */
+ 				if (IndexGetRelation(GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag))
+ 					!= heapId)
+ 					continue;	/* index on wrong heap relation */
+ 			}
+ 		}
+ 
+ 		/*
+ 		 * If we made it here, we have work to do.	We make sure the heap
+ 		 * relation lock exists, then we walk the list of predicate locks for
+ 		 * the old target we found, moving all locks to the heap relation lock
+ 		 * -- unless they already hold that.
+ 		 */
+ 
+ 		/*
+ 		 * First make sure we have the heap relation target.  We only need to
+ 		 * do this once.
+ 		 */
+ 		if (heaptarget == NULL)
+ 		{
+ 			heaptarget = hash_search_with_hash_value(PredicateLockTargetHash,
+ 													 &heaptargettag,
+ 													 heaptargettaghash,
+ 													 HASH_ENTER, &found);
+ 			Assert(heaptarget != NULL);
+ 			if (!found)
+ 				SHMQueueInit(&heaptarget->predicateLocks);
+ 			newpredlocktag.myTarget = heaptarget;
+ 		}
+ 
+ 		/*
+ 		 * Loop through moving locks from this target to the relation target.
+ 		 */
+ 		oldpredlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(oldtarget->predicateLocks),
+ 						 &(oldtarget->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (oldpredlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 			PREDICATELOCK *newpredlock;
+ 			SerCommitSeqNo oldCommitSeqNo = oldpredlock->commitSeqNo;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(oldtarget->predicateLocks),
+ 							 &(oldpredlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 			newpredlocktag.myXact = oldpredlock->tag.myXact;
+ 
+ 			SHMQueueDelete(&(oldpredlock->xactLink));
+ 			/* No need for retail delete from oldtarget list. */
+ 			hash_search(PredicateLockHash,
+ 						&oldpredlock->tag,
+ 						HASH_REMOVE, &found);
+ 			Assert(found);
+ 
+ 			newpredlock = (PREDICATELOCK *)
+ 				hash_search_with_hash_value
+ 				(PredicateLockHash,
+ 				 &newpredlocktag,
+ 				 PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
+ 														 heaptargettaghash),
+ 				 HASH_ENTER_NULL, &found);
+ 			Assert(newpredlock != NULL);
+ 			if (!found)
+ 			{
+ 				SHMQueueInsertBefore(&(heaptarget->predicateLocks),
+ 									 &(newpredlock->targetLink));
+ 				SHMQueueInsertBefore(&(newpredlocktag.myXact->predicateLocks),
+ 									 &(newpredlock->xactLink));
+ 				newpredlock->commitSeqNo = oldCommitSeqNo;
+ 			}
+ 			else
+ 			{
+ 				if (newpredlock->commitSeqNo < oldCommitSeqNo)
+ 					newpredlock->commitSeqNo = oldCommitSeqNo;
+ 			}
+ 
+ 			Assert(newpredlock->commitSeqNo != 0);
+ 			Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
+ 				   || (newpredlock->tag.myXact == OldCommittedSxact));
+ 
+ 			oldpredlock = nextpredlock;
+ 		}
+ 
+ 		hash_search(PredicateLockTargetHash, &oldtarget->tag, HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	/* Put the reserved entry back */
+ 	hash_search_with_hash_value(PredicateLockTargetHash,
+ 								&ReservedTargetTag,
+ 								reservedtargettaghash,
+ 								HASH_ENTER, &found);
+ 	Assert(!found);
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
  
  /*
   *		PredicateLockPageSplit
***************
*** 2646,2655 **** PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno,
  	bool		success;
  
  	/*
! 	 * Bail out quickly if there are no serializable transactions
! 	 * running. As with PredicateLockTupleRowVersionLink, it's safe to
! 	 * check this without taking locks because the caller is holding
! 	 * the buffer page lock.
  	 */
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
--- 2858,2866 ----
  	bool		success;
  
  	/*
! 	 * Bail out quickly if there are no serializable transactions running. As
! 	 * with PredicateLockTupleRowVersionLink, it's safe to check this without
! 	 * taking locks because the caller is holding the buffer page lock.
  	 */
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
***************
*** 66,71 **** extern void index_drop(Oid indexId);
--- 66,73 ----
  
  extern IndexInfo *BuildIndexInfo(Relation index);
  
+ extern Oid IndexGetRelation(Oid indexId);
+ 
  extern void FormIndexDatum(IndexInfo *indexInfo,
  			   TupleTableSlot *slot,
  			   EState *estate,
*** a/src/include/storage/predicate.h
--- b/src/include/storage/predicate.h
***************
*** 50,55 **** extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
--- 50,56 ----
  extern void PredicateLockTupleRowVersionLink(const Relation relation, const HeapTuple oldTuple, const HeapTuple newTuple);
  extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
  extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+ extern void TransferPredicateLocksToHeapRelation(const Relation relation);
  extern void ReleasePredicateLocks(const bool isCommit);
  
  /* conflict detection (may also trigger rollback) */
#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Just a quick status update.

I wrote:

Consider this a WIP patch

The serializable branch on my git repo has a modified form of this
which has been tested successfully with:

DROP INDEX
REINDEX
VACUUM FULL
CLUSTER
ALTER TABLE

I'm holding off on posting another version of the patch until I
cover the remaining commands, which are:

TRUNCATE TABLE
DROP TABLE
DROP DATABASE

I'm having to work on this off-hours at the moment, so expect the
patch to come sometime this weekend.

-Kevin

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#8)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 30.04.2011 01:04, Kevin Grittner wrote:

TRUNCATE TABLE and DROP TABLE should generate a rw-conflict *in* to
the enclosing transaction (if it is serializable) from all
transactions holding predicate locks on the table or its indexes.
Note that this could cause a transactions which is running one of
these statements to roll back with a serialization error. This seems
correct to me, since these operations essentially delete all rows.
If you don't want the potential rollback, these operations should be
run at another isolation level. The difference between these two
statements is that I think that TRUNCATE TABLE should also move the
existing predicate locks to relation locks on the table while DROP
TABLE (for obvious reasons) should just delete the predicate locks.

Note that TRUNCATE has never been MVCC-safe anyway. Perhaps it's best to
just treat it like DROP TABLE. Or can we use the predicate lock
mechanism to abort serializable transactions that incorrectly see the
table as empty?

DROP DATABASE should quietly clean up any predicate locks from
committed transactions which haven't yet hit their cleanup point
because of overlapping transactions in other databases.

This is just an optimization, right? The predicate locks will eventually
go away anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#11)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas wrote:
On 30.04.2011 01:04, Kevin Grittner wrote:

TRUNCATE TABLE and DROP TABLE should generate a rw-conflict *in*
to the enclosing transaction (if it is serializable) from all
transactions holding predicate locks on the table or its indexes.
Note that this could cause a transactions which is running one of
these statements to roll back with a serialization error. This
seems correct to me, since these operations essentially delete all
rows. If you don't want the potential rollback, these operations
should be run at another isolation level. The difference between
these two statements is that I think that TRUNCATE TABLE should
also move the existing predicate locks to relation locks on the
table while DROP TABLE (for obvious reasons) should just delete
the predicate locks.

Note that TRUNCATE has never been MVCC-safe anyway.

Yeah, Dan pointed out that another REPEATABLE READ or SERIALIZABLE
transaction *will* see the work of a committed TRUNCATE TABLE
statement, so it is not semantically identical to DELETE FROM with no
WHERE clause, which was something I somehow had gotten into my head.

Perhaps it's best to just treat it like DROP TABLE.

We had been leaning that way based on the above observation.

Or can we use the predicate lock mechanism to abort serializable
transactions that incorrectly see the table as empty?

Predicate locks only interact with writes; it'd be a rather nasty
wart to try to bend them to the above use. I think we just have to
stick with the dictum which has controlled so far -- Serializable
Snapshot Isolation can only serialize those things which follow the
semantics of Snapshot Isolation. TRUNCATE TABLE doesn't.

DROP DATABASE should quietly clean up any predicate locks from
committed transactions which haven't yet hit their cleanup point
because of overlapping transactions in other databases.

This is just an optimization, right? The predicate locks will
eventually go away anyway.

Yes, correct. The only way they could create a problem besides just
taking up predicate lock slots is if a new database was created with
an identical OID before overlapping serializable transactions in
other databases completed; that seems rather far-fetched. Perhaps
DROP DATABASE is so infrequent that it's not worth the extra code to
do the early cleanup? The code to do that part might not carry its
own weight on a cost/benefit basis.

-Kevin

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)
1 attachment(s)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 1 May 2011 I wrote:

Consider this a WIP patch

Just so people know where this stands...

By 8 May 2011 I had the attached. I didn't post it because I was
not confident I had placed the calls to the SSI functions for DROP
TABLE and TRUNCATE TABLE correctly. Then came PGCon and also the
row versus tuple discussion -- the patch for which had conflicts
with this one.

I will be reviewing this on the weekend and making any final
adjustments, but the patch is very likely to look just like this
with the possible exception of placement of the DROP TABLE and
TRUNCATE TABLE calls. If anyone wants to start reviewing this now,
it would leave more time for me to make changes if needed.

Also, if anyone has comments or hints about the placement of those
calls, I'd be happy to receive them.

This patch compiles with `make world`, passes `make check-world`,
and passes both `make installcheck-world` and `make -C
src/test/isolation installcheck` against a database with
default_transaction_isolation = 'serializable'. It also passed a
few ad hoc tests of the implemented functionality.

-Kevin

Attachments:

ssi-ddl-2.patchtext/plain; name=ssi-ddl-2.patchDownload
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 1658,1663 **** heap_drop_with_catalog(Oid relid)
--- 1658,1671 ----
  	CheckTableNotInUse(rel, "DROP TABLE");
  
  	/*
+ 	 * This effectively deletes all rows in the table, and may be done in a
+ 	 * serializable transaction.  In that case we must record a rw-conflict in
+ 	 * to this transaction from each transaction holding a predicate lock on
+ 	 * the table.
+ 	 */
+ 	CheckTableForSerializableConflictIn(rel);
+ 
+ 	/*
  	 * Delete pg_foreign_table tuple first.
  	 */
  	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
***************
*** 1688,1693 **** heap_drop_with_catalog(Oid relid)
--- 1696,1706 ----
  	}
  
  	/*
+ 	 * Clean up any remaining predicate locks on the relation.
+ 	 */
+ 	DropAllPredicateLocksFromTable(rel);
+ 
+ 	/*
  	 * Close relcache entry, but *keep* AccessExclusiveLock on the relation
  	 * until transaction commit.  This ensures no one else will try to do
  	 * something with the doomed relation.
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 54,59 ****
--- 54,60 ----
  #include "parser/parser.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
***************
*** 1311,1316 **** index_drop(Oid indexId)
--- 1312,1323 ----
  	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
  
  	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
+ 	/*
  	 * Schedule physical removal of the files
  	 */
  	RelationDropStorage(userIndexRelation);
***************
*** 2787,2792 **** reindex_index(Oid indexId, bool skip_constraint_checks)
--- 2794,2805 ----
  	 */
  	CheckTableNotInUse(iRel, "REINDEX INDEX");
  
+ 	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(iRel);
+ 
  	PG_TRY();
  	{
  		/* Suppress use of the target index while rebuilding it */
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 39,44 ****
--- 39,45 ----
  #include "optimizer/planner.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
***************
*** 385,390 **** cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
--- 386,397 ----
  	if (OidIsValid(indexOid))
  		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
  
+ 	/*
+ 	 * All predicate locks on the table and its indexes are about to be made
+ 	 * invalid.  Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(OldHeap);
+ 
  	/* rebuild_relation does all the dirty work */
  	rebuild_relation(OldHeap, indexOid, freeze_min_age, freeze_table_age,
  					 verbose);
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 155,160 ****
--- 155,162 ----
   *							   BlockNumber newblkno);
   *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
   *								 BlockNumber newblkno);
+  *		TransferPredicateLocksToHeapRelation(const Relation relation)
+  *		DropAllPredicateLocksFromTable(const Relation relation)
   *		ReleasePredicateLocks(bool isCommit)
   *
   * conflict detection (may also trigger rollback)
***************
*** 162,167 ****
--- 164,170 ----
   *										HeapTupleData *tup, Buffer buffer)
   *		CheckForSerializableConflictIn(Relation relation, HeapTupleData *tup,
   *									   Buffer buffer)
+  *		CheckTableForSerializableConflictIn(const Relation relation)
   *
   * final rollback checking
   *		PreCommit_CheckForSerializationFailure(void)
***************
*** 189,194 ****
--- 192,198 ----
  #include "storage/procarray.h"
  #include "utils/rel.h"
  #include "utils/snapmgr.h"
+ #include "utils/syscache.h"
  #include "utils/tqual.h"
  
  /* Uncomment the next line to test the graceful degradation code. */
***************
*** 434,439 **** static bool TransferPredicateLocksToNewTarget(const PREDICATELOCKTARGETTAG oldta
--- 438,445 ----
  								  const PREDICATELOCKTARGETTAG newtargettag,
  								  bool removeOld);
  static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag);
+ static Oid	IfIndexGetRelation(Oid indexId);
+ static void DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer);
  static void SetNewSxactGlobalXmin(void);
  static bool ReleasePredicateLocksIfROSafe(void);
  static void ClearOldPredicateLocks(void);
***************
*** 2543,2548 **** exit:
--- 2549,2863 ----
  	return !outOfShmem;
  }
  
+ /*
+  * IfIndexGetRelation: given a relation OID, get the OID of the heap
+  * relation it is an index on, or return InvalidOid if the argument is not
+  * an index.	Uses the system cache.
+  */
+ static Oid
+ IfIndexGetRelation(Oid indexId)
+ {
+ 	HeapTuple	tuple;
+ 	Form_pg_index index;
+ 	Oid			result;
+ 
+ 	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexId));
+ 	if (!HeapTupleIsValid(tuple))
+ 		return InvalidOid;
+ 
+ 	index = (Form_pg_index) GETSTRUCT(tuple);
+ 	Assert(index->indexrelid == indexId);
+ 
+ 	result = index->indrelid;
+ 	ReleaseSysCache(tuple);
+ 	return result;
+ }
+ 
+ /*
+  * Drop all predicate locks of any granularity from a heap relation and all of
+  * its indexes, with optional transfer to a relation-level lock on the heap.
+  *
+  * This requires grabbing a lot of LW locks and scanning the entire lock
+  * target table for matches.  That makes this more expensive than most
+  * predicate lock management functions, but it will only be called for DDL
+  * type commands and there are fast returns when no serializable transactions
+  * are active or the relation is temporary.
+  *
+  * We are not using the TransferPredicateLocksToNewTarget function because
+  * it acquires its own locks on the partitions of the two targets invovled,
+  * and we'll already be holding all partition locks.
+  *
+  * We can't throw an error from here, because the call could be from a
+  * transaction which is not serializable.
+  */
+ static void
+ DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *oldtarget;
+ 	PREDICATELOCKTARGET *heaptarget;
+ 	PREDICATELOCKTARGETTAG heaptargettag;
+ 	PREDICATELOCKTAG newpredlocktag;
+ 	Oid			dbId;
+ 	Oid			indexId;
+ 	Oid			heapId;
+ 	int			i;
+ 	bool		isSingleIndex;
+ 	bool		found;
+ 	uint32		reservedtargettaghash;
+ 	uint32		heaptargettaghash;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks
+ 	 * which would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSplitTracking(relation))
+ 		return;
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	if (relation->rd_index == NULL)
+ 	{
+ 		isSingleIndex = false;
+ 		indexId = InvalidOid;	/* quiet compiler warning */
+ 		heapId = relation->rd_id;
+ 	}
+ 	else
+ 	{
+ 		isSingleIndex = true;
+ 		indexId = relation->rd_id;
+ 		heapId = relation->rd_index->indrelid;
+ 	}
+ 	Assert(heapId != InvalidOid);
+ 	Assert(transfer || !isSingleIndex);	/* index OID only makes sense with transfer */
+ 
+ 	SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId);
+ 	heaptargettaghash = PredicateLockTargetTagHashCode(&heaptargettag);
+ 	heaptarget = NULL;			/* Retrieve first time needed, then keep. */
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_EXCLUSIVE);
+ 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ 
+ 	/*
+ 	 * If there are any locks to be moved, that means there we will wind up
+ 	 * with a relation lock on the heap.  That covers all other locks on the
+ 	 * heap and all locks on all indexes for the table, so we can be very
+ 	 * aggressive about deleting any locks except for the heap relation lock.
+ 	 * But we don't want to add a heap relation lock unless there is at least
+ 	 * one lock that needs to be transferred.  If this function is called
+ 	 * with an index OID, we'll first scan to see if there are any predicate
+ 	 * locks on that index.  As soon as we find one we drop down into the
+ 	 * update loop.  If we find none, we release the LW locks and return
+ 	 * without changing anything.
+ 	 *
+ 	 * This optimization comes into play two ways -- a REINDEX might be done
+ 	 * on an index which has no predicate locks, or an operation might be done
+ 	 * which rewrites the entire table and calls REINDEX on each index. In the
+ 	 * latter case the action against the base table will move all the index
+ 	 * locks before any of the index rebuilds are requested.
+ 	 */
+ 	if (isSingleIndex)
+ 	{
+ 		bool		foundIndexLock = false;
+ 
+ 		hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 		while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) != indexId)
+ 				continue;		/* wrong OID for the index */
+ 			if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 				continue;		/* wrong database */
+ 			foundIndexLock = true;
+ 			hash_seq_term(&seqstat);
+ 			break;
+ 		}
+ 		if (!foundIndexLock)
+ 		{
+ 			/* Release locks in reverse order */
+ 			LWLockRelease(SerializableXactHashLock);
+ 			for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 				LWLockRelease(FirstPredicateLockMgrLock + i);
+ 			LWLockRelease(SerializablePredicateLockListLock);
+ 			return;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Remove the reserved entry to give us scratch space, so we know we'll be
+ 	 * able to create the new lock target.
+ 	 */
+ 	reservedtargettaghash = 0;	/* quiet compiler warning */
+ 	if (transfer)
+ 	{
+ 		reservedtargettaghash = PredicateLockTargetTagHashCode(&ReservedTargetTag);
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	/* Scan through PredicateLockHash and copy contents */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *oldpredlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 			continue;			/* wrong database */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) == heapId)
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_TYPE(oldtarget->tag) == PREDLOCKTAG_RELATION)
+ 				continue;		/* already the right lock */
+ 		}
+ 		else
+ 		{
+ 			if (IfIndexGetRelation(GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag)) != heapId)
+ 				continue;		/* not index or index on wrong heap relation */
+ 		}
+ 
+ 		/*
+ 		 * If we made it here, we have work to do.	We make sure the heap
+ 		 * relation lock exists, then we walk the list of predicate locks for
+ 		 * the old target we found, moving all locks to the heap relation lock
+ 		 * -- unless they already hold that.
+ 		 */
+ 
+ 		/*
+ 		 * First make sure we have the heap relation target.  We only need to
+ 		 * do this once.
+ 		 */
+ 		if (transfer && heaptarget == NULL)
+ 		{
+ 			heaptarget = hash_search_with_hash_value(PredicateLockTargetHash,
+ 													 &heaptargettag,
+ 													 heaptargettaghash,
+ 													 HASH_ENTER, &found);
+ 			Assert(heaptarget != NULL);
+ 			if (!found)
+ 				SHMQueueInit(&heaptarget->predicateLocks);
+ 			newpredlocktag.myTarget = heaptarget;
+ 		}
+ 
+ 		/*
+ 		 * Loop through moving locks from this target to the relation target.
+ 		 */
+ 		oldpredlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(oldtarget->predicateLocks),
+ 						 &(oldtarget->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (oldpredlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 			PREDICATELOCK *newpredlock;
+ 			SerCommitSeqNo oldCommitSeqNo = oldpredlock->commitSeqNo;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(oldtarget->predicateLocks),
+ 							 &(oldpredlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 			newpredlocktag.myXact = oldpredlock->tag.myXact;
+ 
+ 			/*
+ 			 * It's OK ot remove the old lock first because of the ACCESS
+ 			 * EXCLUSIVE lock on the heap relation when this is called.  It is
+ 			 * desirable to do so because it avoids any chance of running out
+ 			 * of lock structure entries for the table.
+ 			 */
+ 			SHMQueueDelete(&(oldpredlock->xactLink));
+ 			/* No need for retail delete from oldtarget list. */
+ 			hash_search(PredicateLockHash,
+ 						&oldpredlock->tag,
+ 						HASH_REMOVE, &found);
+ 			Assert(found);
+ 
+ 			if (transfer)
+ 			{
+ 				newpredlock = (PREDICATELOCK *)
+ 					hash_search_with_hash_value
+ 					(PredicateLockHash,
+ 					&newpredlocktag,
+ 					PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
+ 															heaptargettaghash),
+ 					HASH_ENTER_NULL, &found);
+ 				Assert(newpredlock != NULL);
+ 				if (!found)
+ 				{
+ 					SHMQueueInsertBefore(&(heaptarget->predicateLocks),
+ 										&(newpredlock->targetLink));
+ 					SHMQueueInsertBefore(&(newpredlocktag.myXact->predicateLocks),
+ 										&(newpredlock->xactLink));
+ 					newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 				else
+ 				{
+ 					if (newpredlock->commitSeqNo < oldCommitSeqNo)
+ 						newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 
+ 				Assert(newpredlock->commitSeqNo != 0);
+ 				Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
+ 					  || (newpredlock->tag.myXact == OldCommittedSxact));
+ 			}
+ 
+ 			oldpredlock = nextpredlock;
+ 		}
+ 
+ 		hash_search(PredicateLockTargetHash, &oldtarget->tag, HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	if (transfer)
+ 	{
+ 		/* Put the reserved entry back */
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_ENTER, &found);
+ 		Assert(!found);
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ /*
+  * TransferPredicateLocksToHeapRelation
+  *		For all transactions, transfer all predicate locks for the given
+  *		relation to a single relation lock on the heap.  For a heap relation
+  *		that includes all locks on indexes; for an index the same locks moves
+  *		are needed, but only if one or more locks exists on that index.
+  */
+ void
+ TransferPredicateLocksToHeapRelation(const Relation relation)
+ {
+ 	DropAllPredicateLocksFromTableImpl(relation, true);
+ }
+ 
+ /*
+ * DropAllPredicateLocksFromTable
+  *		For all transactions, drop all predicate locks for the given table.
+  *		That includes all locks on its indexes.
+  */
+ void
+ DropAllPredicateLocksFromTable(const Relation relation)
+ {
+ 	DropAllPredicateLocksFromTableImpl(relation, false);
+ }
+ 
  
  /*
   *		PredicateLockPageSplit
***************
*** 3792,3797 **** CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple,
--- 4107,4208 ----
  }
  
  /*
+  * CheckTableForSerializableConflictIn
+  *		The entire table is going through a DDL-style logical mass write (like
+  *		TRUNCATE TABLE or DROP TABLE.  While these operations do not operate
+  *		entirely within the bounds of snapshot isolation, they can occur
+  *		inside of a serialziable transaction, and will logically occur after
+  *		any reads which saw rows which were destroyed by these operations, so
+  *		we do what we can to serialize properly under SSI.
+  *
+  * The relation passed in must be a heap relation for a table. Any predicate
+  * lock of any granularity on the table or any of its indexes will cause a
+  * rw-conflict in to this transaction.
+  *
+  * This should be done before altering the predicate locks because the
+  * transaction could be rolled back because of a conflict, in which case the
+  * lock changes are not needed.
+  */
+ void
+ CheckTableForSerializableConflictIn(const Relation relation)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *target;
+ 	Oid			dbId;
+ 	Oid			heapId;
+ 	int			i;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks
+ 	 * which would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSerialization(relation))
+ 		return;
+ 
+ 	Assert(relation->rd_index == NULL);
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	heapId = relation->rd_id;
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_SHARED);
+ 	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ 
+ 	/* Scan through PredicateLockHash and copy contents */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((target = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *predlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(target->tag) != dbId)
+ 			continue;			/* wrong database */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(target->tag) != heapId
+ 			&& IfIndexGetRelation(GET_PREDICATELOCKTARGETTAG_RELATION(target->tag)) != heapId)
+ 			continue;		/* not index or index on wrong heap relation */
+ 
+ 		/*
+ 		 * Loop through locks for this target and flag conflicts.
+ 		 */
+ 		predlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(target->predicateLocks),
+ 						 &(target->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (predlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(target->predicateLocks),
+ 							 &(predlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 
+ 			if (predlock->tag.myXact != MySerializableXact
+ 				&& !RWConflictExists(predlock->tag.myXact, (SERIALIZABLEXACT *) MySerializableXact))
+ 				FlagRWConflict(predlock->tag.myXact, (SERIALIZABLEXACT *) MySerializableXact);
+ 
+ 			predlock = nextpredlock;
+ 		}
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ 
+ /*
   * Flag a rw-dependency between two serializable transactions.
   *
   * The caller is responsible for ensuring that we have a LW lock on
*** a/src/include/storage/predicate.h
--- b/src/include/storage/predicate.h
***************
*** 49,59 **** extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
--- 49,62 ----
  extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
  extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
  extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+ extern void TransferPredicateLocksToHeapRelation(const Relation relation);
+ extern void DropAllPredicateLocksFromTable(const Relation relation);
  extern void ReleasePredicateLocks(const bool isCommit);
  
  /* conflict detection (may also trigger rollback) */
  extern void CheckForSerializableConflictOut(const bool valid, const Relation relation, const HeapTuple tuple, const Buffer buffer);
  extern void CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple, const Buffer buffer);
+ extern void CheckTableForSerializableConflictIn(const Relation relation);
  
  /* final rollback checking */
  extern void PreCommit_CheckForSerializationFailure(void);
#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#13)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 03.06.2011 21:04, Kevin Grittner wrote:

Also, if anyone has comments or hints about the placement of those
calls, I'd be happy to receive them.

heap_drop_with_catalog() schedules the relation for deletion at the end
of transaction, but it's still possible that the transaction aborts and
the heap doesn't get dropped after all. If you put the
DropAllPredicateLocksFromTable() call there, and the transaction later
aborts, you've lost all the locks already.

I think you'll need to just memorize the lock deletion command in a
backend-local list, and perform the deletion in a post-commit function.
Something similar to the PendingRelDelete stuff in storage.c. In fact,
hooking into smgrDoPendingDeletes would work, but that seems like a
modularity violation.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#14)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 03.06.2011 21:04, Kevin Grittner wrote:

Also, if anyone has comments or hints about the placement of
those calls, I'd be happy to receive them.

heap_drop_with_catalog() schedules the relation for deletion at
the end of transaction, but it's still possible that the
transaction aborts and the heap doesn't get dropped after all. If
you put the DropAllPredicateLocksFromTable() call there, and the
transaction later aborts, you've lost all the locks already.

I think you'll need to just memorize the lock deletion command in
a backend-local list, and perform the deletion in a post-commit
function. Something similar to the PendingRelDelete stuff in
storage.c. In fact, hooking into smgrDoPendingDeletes would work,
but that seems like a modularity violation.

Thanks. That's helpful. Will look at that code and do something
similar.

I knew it didn't look right in the place it was, but couldn't quite
see what to do instead....

-Kevin

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#15)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

I think you'll need to just memorize the lock deletion command in
a backend-local list, and perform the deletion in a post-commit
function. Something similar to the PendingRelDelete stuff in
storage.c. In fact, hooking into smgrDoPendingDeletes would work,
but that seems like a modularity violation.

Thanks. That's helpful. Will look at that code and do something
similar.

Keep in mind that it's too late to throw any sort of error post-commit.
Any code you add there will need to have negligible probability of
failure.

regards, tom lane

#17Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#16)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

I think you'll need to just memorize the lock deletion command

< in a backend-local list, and perform the deletion in a

post-commit function. Something similar to the PendingRelDelete
stuff in storage.c. In fact, hooking into smgrDoPendingDeletes
would work, but that seems like a modularity violation.

Thanks. That's helpful. Will look at that code and do something
similar.

Keep in mind that it's too late to throw any sort of error
post-commit. Any code you add there will need to have negligible
probability of failure.

Thanks for the heads-up. I think we're OK in that regard, though.
We have two commit-time functions in SSI:

PreCommit_CheckForSerializationFailure(void) which is the "last
chance for failure" before actual commit, and

ReleasePredicateLocks(const bool isCommit) which is for resource
cleanup after rollback or commit is effective.

We pretty much can't fail on the latter except for Assert
statements, and this new logic would be the same. It's hard to fail
when freeing resources unless something is quite seriously hosed
already. This is where I was planning to put the freeing of the
locks, based on queuing up the request at the time the DROP TABLE
statement is encountered.

-Kevin

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#14)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

I think you'll need to just memorize the lock deletion command in
a backend-local list, and perform the deletion in a post-commit
function.

Hmm. As mentioned earlier in the thread, cleaning these up doesn't
actually have any benefit beyond freeing space in the predicate
locking collections. I'm not sure that benefit is enough to justify
this much new mechanism. Maybe I should just leave them alone and
let them get cleaned up in due course with the rest of the locks.
Any opinions on that?

-Kevin

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#18)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 03.06.2011 23:44, Kevin Grittner wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

I think you'll need to just memorize the lock deletion command in
a backend-local list, and perform the deletion in a post-commit
function.

Hmm. As mentioned earlier in the thread, cleaning these up doesn't
actually have any benefit beyond freeing space in the predicate
locking collections. I'm not sure that benefit is enough to justify
this much new mechanism. Maybe I should just leave them alone and
let them get cleaned up in due course with the rest of the locks.
Any opinions on that?

Is there a chance of false positives if oid wraparound happens and a new
table gets the same oid as the old one? It's also possible for a heap to
get the OID of an old index or vice versa, will that confuse things?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#19)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 03.06.2011 23:44, Kevin Grittner wrote:

Hmm. As mentioned earlier in the thread, cleaning these up
doesn't actually have any benefit beyond freeing space in the
predicate locking collections. I'm not sure that benefit is
enough to justify this much new mechanism. Maybe I should just
leave them alone and let them get cleaned up in due course with
the rest of the locks. Any opinions on that?

Is there a chance of false positives if oid wraparound happens and
a new table gets the same oid as the old one? It's also possible
for a heap to get the OID of an old index or vice versa, will that
confuse things?

Tuple locks should be safe from that because we use the tuple xmin
as part of the target key, but page and heap locks could result in
false positive serialization failures on OID wraparound unless we do
the cleanup. I guess that tips the scales in favor of it being
worth the extra code. I think it's still in that gray area where
it's a judgment call because it would be very infrequent and it
would be recoverable; but the fewer mysterious rollbacks, the fewer
posts about it we'll get. So any costs in maintaining the extra
code will probably be saved in reduced support, even if the
performance benefit is negligible.

-Kevin

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#20)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

Tuple locks should be safe from that because we use the tuple xmin
as part of the target key, but page and heap locks

That should have read "page and relation locks".

I guess that tips the scales in favor of it being worth the extra
code. I think it's still in that gray area

I just thought of something which takes it out of the gray area for
me: pg_locks. Even though it would be extremely rare for a false
positive to actually occur if we let this go, people would be sure
to get confused by seeing locks on the dropped objects in the
pg_locks view.. They've got to be cleaned up.

-Kevin

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#14)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 03.06.2011 21:04, Kevin Grittner wrote:

Also, if anyone has comments or hints about the placement of those
calls, I'd be happy to receive them.

heap_drop_with_catalog() schedules the relation for deletion at the end
of transaction, but it's still possible that the transaction aborts and
the heap doesn't get dropped after all. If you put the
DropAllPredicateLocksFromTable() call there, and the transaction later
aborts, you've lost all the locks already.

But on the third thought: is that wrong? Surely locks taken by an
aborted transaction can be discarded.

regards, tom lane

#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#22)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 04.06.2011 19:19, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

On 03.06.2011 21:04, Kevin Grittner wrote:

Also, if anyone has comments or hints about the placement of those
calls, I'd be happy to receive them.

heap_drop_with_catalog() schedules the relation for deletion at the end
of transaction, but it's still possible that the transaction aborts and
the heap doesn't get dropped after all. If you put the
DropAllPredicateLocksFromTable() call there, and the transaction later
aborts, you've lost all the locks already.

But on the third thought: is that wrong? Surely locks taken by an
aborted transaction can be discarded.

These are predicate locks - there can be "locks" on the table belonging
to transactions that have already committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#24Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#23)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas wrote:

On 04.06.2011 19:19, Tom Lane wrote:

Heikki Linnakangas writes:

On 03.06.2011 21:04, Kevin Grittner wrote:

Also, if anyone has comments or hints about the placement of
those calls, I'd be happy to receive them.

heap_drop_with_catalog() schedules the relation for deletion at
the end of transaction, but it's still possible that the
transaction aborts and the heap doesn't get dropped after all. If
you put the DropAllPredicateLocksFromTable() call there, and the
transaction later aborts, you've lost all the locks already.

But on the third thought: is that wrong? Surely locks taken by an
aborted transaction can be discarded.

These are predicate locks - there can be "locks" on the table
belonging to transactions that have already committed.

It took me a while to think this through, but if the transaction (T1)
which reads the table to create the SIREAD lock overlaps another
transaction (T2) with which it might interact to create a dangerous
structure, and T2 has not yet accessed the table in any way,
then after T1 commits a third transaction (T3) could try to drop the
table but roll back, and T2 could still proceed to do a write which
conflicts with the predicate lock.

That certainly sounds like a low frequency combination of events, but
one which can't be ignored if we want correct behavior (i.e., no
false negatives).

-Kevin

#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#24)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas wrote:
On 03.06.2011 23:44, Kevin Grittner wrote:

Heikki Linnakangas wrote:

I think you'll need to just memorize the lock deletion command in
a backend-local list, and perform the deletion in a post-commit
function.

Hmm. As mentioned earlier in the thread, cleaning these up doesn't
actually have any benefit beyond freeing space in the predicate
locking collections. I'm not sure that benefit is enough to
justify this much new mechanism. Maybe I should just leave them
alone and let them get cleaned up in due course with the rest of
the locks. Any opinions on that?

Is there a chance of false positives if oid wraparound happens and
a new table gets the same oid as the old one? It's also possible
for a heap to get the OID of an old index or vice versa, will that
confuse things?

Some additional thoughts after working on the new code...

The risk we're trying to cover is that a table can be dropped while
committed transactions still have predicate locks against it, which
won't be cleared until overlapping transactions which are still
active, and which can form a dangerous structure with the committed
transactions, commit. If we just leave those predicate locks to be
cleaned up in the normal course of transaction completion, there
could be a new object created with the old OID which, if a
complicated set of conditions are met, could lead to a serialization
failure of a transaction which would otherwise succeed. We can't
clean the predicate locks up immediately when the DROP TABLE appears
to succeed, because the enclosing transaction (or subtransaction)
could still be rolled back.

That last bit seems pretty hard to dodge, because premature predicate
lock cleanup could lead to false negatives -- which would allow data
inconsistencies. So cleanup before commit is definitely out. But
some additional time meditating on the dangerous structure diagram
today has me wondering whether we might be OK just ignoring cleanup
prior to what would normally happen as transactions complete.

Tin - - -> Tpivot - - -> Tout

As mentioned above, this whole issue is based around concern about
false positives when a transaction has read from a table and
committed before the table is dropped. We don't need to worry about
that for Tout -- its position in the dangerous structure is just
based around what it *writes*; it can't be responsible for the
read-and-commit before the table is dropped. To have a problem,
Tpivot can't be the transaction which commits before the DROP TABLE
either -- since it reads what Tout writes and Tout has to be first to
commit, then it cannot read and commit before the DROP TABLE and
still have Tout write afterward. That means that the transaction
which did the read and commit would have to be Tin, and Tpivot would
need to be the transaction which later wrote to an object with the
same OID as the dropped table.

For a problem to manifest itself, this sequence must occur:

- The timing of transaction starts doesn't matter (other than the RO
optimizations) as long as Tpivot overlaps both Tin and Tout.
- Tout commits first
- Any time before Tin commits, it reads from relation X.
- Tin commits second
- Tpivot develops a rw-conflict out to Tout on a relation *other
than* X, any time after both have started and before Tpivot
commits.
- Tpivot writes to a relation which has the same OID as relation X,
but which isn't relation X. This requires that a new relation be
created with the same OID which had been used by X, and that this
new relation is visible to Tpivot -- which acquired its snapshot
*before X was dropped*.

Is this possible? If a transaction gets its snapshot while OID of N
is assigned to relation X, can that transaction wind up seeing an OID
of N as a reference to relation Y? If not, there aren't any false
positives possible.

Given the quantity and complexity of code which I've needed to
generate to cover this, I'm wondering again whether it is justified
*even if* such a false positive is possible. It certainly seems like
it would be a *very* infrequent occurrence, There's also the
question of what shows in pg_locks, but I'm wondering whether even
that is enough to justify this much code.

Thoughts? Maybe I should submit a patch without added complexity of
the scheduled cleanup and we can discuss that as a separate patch?

-Kevin

#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#25)
1 attachment(s)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" wrote:

Maybe I should submit a patch without added complexity of the
scheduled cleanup and we can discuss that as a separate patch?

Here's a patch which adds the missing support for DDL. Cleanup of
predicate locks at commit time for transactions which ran DROP TABLE
or TRUNCATE TABLE can be added as a separate patch. I consider those
to be optimizations which are of dubious benefit, especially compared
to the complexity of the extra code required.

Also, Heikki pointed out that explicitly releasing the predicate
locks after a DROP DATABASE is an optimization, which on reflection
also seems to be of dubious value compared to the code needed.

Unless someone can find a way in which any of these early cleanups
are needed for correctness, I suggest they are better left as
enhancements in some future release, where there can be some
demonstration that they matter enough for performance to justify the
extra code, and there can be more testing to ensure the new code
doesn't break anything.

I stashed the partial work on the more aggressive cleanup, so if
there's a consensus that we want it, I can post a patch pretty
quickly.

In making sure that the new code for this patch was in pgindent
format, I noticed that the ASCII art and bullet lists recently added
to OnConflict_CheckForSerializationFailure() were mangled badly by
pgindent, so I added the dashes to protect those and included a
pgindent form of that function. That should save someone some
trouble sorting things out after the next global pgindent run.

-Kevin

Attachments:

ssi-ddl-3.patchapplication/octet-stream; name=ssi-ddl-3.patchDownload
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 63,68 ****
--- 63,69 ----
  #include "parser/parse_relation.h"
  #include "storage/bufmgr.h"
  #include "storage/freespace.h"
+ #include "storage/predicate.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
***************
*** 1658,1663 **** heap_drop_with_catalog(Oid relid)
--- 1659,1672 ----
  	CheckTableNotInUse(rel, "DROP TABLE");
  
  	/*
+ 	 * This effectively deletes all rows in the table, and may be done in a
+ 	 * serializable transaction.  In that case we must record a rw-conflict in
+ 	 * to this transaction from each transaction holding a predicate lock on
+ 	 * the table.
+ 	 */
+ 	CheckTableForSerializableConflictIn(rel);
+ 
+ 	/*
  	 * Delete pg_foreign_table tuple first.
  	 */
  	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 54,59 ****
--- 54,60 ----
  #include "parser/parser.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
***************
*** 1311,1316 **** index_drop(Oid indexId)
--- 1312,1323 ----
  	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
  
  	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
+ 	/*
  	 * Schedule physical removal of the files
  	 */
  	RelationDropStorage(userIndexRelation);
***************
*** 2787,2792 **** reindex_index(Oid indexId, bool skip_constraint_checks)
--- 2794,2805 ----
  	 */
  	CheckTableNotInUse(iRel, "REINDEX INDEX");
  
+ 	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(iRel);
+ 
  	PG_TRY();
  	{
  		/* Suppress use of the target index while rebuilding it */
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 39,44 ****
--- 39,45 ----
  #include "optimizer/planner.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
***************
*** 385,390 **** cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
--- 386,397 ----
  	if (OidIsValid(indexOid))
  		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
  
+ 	/*
+ 	 * All predicate locks on the table and its indexes are about to be made
+ 	 * invalid.  Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(OldHeap);
+ 
  	/* rebuild_relation does all the dirty work */
  	rebuild_relation(OldHeap, indexOid, freeze_min_age, freeze_table_age,
  					 verbose);
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 70,75 ****
--- 70,76 ----
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/lock.h"
+ #include "storage/predicate.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
***************
*** 1040,1045 **** ExecuteTruncate(TruncateStmt *stmt)
--- 1041,1054 ----
  			Oid			toast_relid;
  
  			/*
+ 			 * This effectively deletes all rows in the table, and may be done
+ 			 * in a serializable transaction.  In that case we must record a
+ 			 * rw-conflict in to this transaction from each transaction
+ 			 * holding a predicate lock on the table.
+ 			 */
+ 			CheckTableForSerializableConflictIn(rel);
+ 
+ 			/*
  			 * Need the full transaction-safe pushups.
  			 *
  			 * Create a new empty storage file for the relation, and assign it
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 155,160 ****
--- 155,161 ----
   *							   BlockNumber newblkno);
   *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
   *								 BlockNumber newblkno);
+  *		TransferPredicateLocksToHeapRelation(const Relation relation)
   *		ReleasePredicateLocks(bool isCommit)
   *
   * conflict detection (may also trigger rollback)
***************
*** 162,167 ****
--- 163,169 ----
   *										HeapTupleData *tup, Buffer buffer)
   *		CheckForSerializableConflictIn(Relation relation, HeapTupleData *tup,
   *									   Buffer buffer)
+  *		CheckTableForSerializableConflictIn(const Relation relation)
   *
   * final rollback checking
   *		PreCommit_CheckForSerializationFailure(void)
***************
*** 189,194 ****
--- 191,197 ----
  #include "storage/procarray.h"
  #include "utils/rel.h"
  #include "utils/snapmgr.h"
+ #include "utils/syscache.h"
  #include "utils/tqual.h"
  
  /* Uncomment the next line to test the graceful degradation code. */
***************
*** 434,439 **** static bool TransferPredicateLocksToNewTarget(const PREDICATELOCKTARGETTAG oldta
--- 437,444 ----
  								  const PREDICATELOCKTARGETTAG newtargettag,
  								  bool removeOld);
  static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag);
+ static Oid	IfIndexGetRelation(Oid indexId);
+ static void DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer);
  static void SetNewSxactGlobalXmin(void);
  static bool ReleasePredicateLocksIfROSafe(void);
  static void ClearOldPredicateLocks(void);
***************
*** 2543,2548 **** exit:
--- 2548,2856 ----
  	return !outOfShmem;
  }
  
+ /*
+  * IfIndexGetRelation: given a relation OID, get the OID of the heap
+  * relation it is an index on, or return InvalidOid if the argument is not
+  * an index.	Uses the system cache.
+  */
+ static Oid
+ IfIndexGetRelation(Oid indexId)
+ {
+ 	HeapTuple	tuple;
+ 	Form_pg_index index;
+ 	Oid			result;
+ 
+ 	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexId));
+ 	if (!HeapTupleIsValid(tuple))
+ 		return InvalidOid;
+ 
+ 	index = (Form_pg_index) GETSTRUCT(tuple);
+ 	Assert(index->indexrelid == indexId);
+ 
+ 	result = index->indrelid;
+ 	ReleaseSysCache(tuple);
+ 	return result;
+ }
+ 
+ /*
+  * Drop all predicate locks of any granularity from a heap relation and all of
+  * its indexes, with optional transfer to a relation-level lock on the heap.
+  *
+  * This requires grabbing a lot of LW locks and scanning the entire lock
+  * target table for matches.  That makes this more expensive than most
+  * predicate lock management functions, but it will only be called for DDL
+  * type commands and there are fast returns when no serializable transactions
+  * are active or the relation is temporary.
+  *
+  * We are not using the TransferPredicateLocksToNewTarget function because
+  * it acquires its own locks on the partitions of the two targets involved,
+  * and we'll already be holding all partition locks.
+  *
+  * We can't throw an error from here, because the call could be from a
+  * transaction which is not serializable.
+  *
+  * NOTE: This is currently only called with transfer set to true, but that may
+  * change.	If we decide to clean up the locks from a table on commit of a
+  * transaction which executed DROP TABLE, the false condition will be useful.
+  */
+ static void
+ DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *oldtarget;
+ 	PREDICATELOCKTARGET *heaptarget;
+ 	PREDICATELOCKTARGETTAG heaptargettag;
+ 	PREDICATELOCKTAG newpredlocktag;
+ 	Oid			dbId;
+ 	Oid			indexId;
+ 	Oid			heapId;
+ 	int			i;
+ 	bool		isSingleIndex;
+ 	bool		found;
+ 	uint32		reservedtargettaghash;
+ 	uint32		heaptargettaghash;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
+ 	 * would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSplitTracking(relation))
+ 		return;
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	if (relation->rd_index == NULL)
+ 	{
+ 		isSingleIndex = false;
+ 		indexId = InvalidOid;	/* quiet compiler warning */
+ 		heapId = relation->rd_id;
+ 	}
+ 	else
+ 	{
+ 		isSingleIndex = true;
+ 		indexId = relation->rd_id;
+ 		heapId = relation->rd_index->indrelid;
+ 	}
+ 	Assert(heapId != InvalidOid);
+ 	Assert(transfer || !isSingleIndex); /* index OID only makes sense with
+ 										 * transfer */
+ 
+ 	SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId);
+ 	heaptargettaghash = PredicateLockTargetTagHashCode(&heaptargettag);
+ 	heaptarget = NULL;			/* Retrieve first time needed, then keep. */
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_EXCLUSIVE);
+ 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ 
+ 	/*
+ 	 * If there are any locks to be moved, that means there we will wind up
+ 	 * with a relation lock on the heap.  That covers all other locks on the
+ 	 * heap and all locks on all indexes for the table, so we can be very
+ 	 * aggressive about deleting any locks except for the heap relation lock.
+ 	 * But we don't want to add a heap relation lock unless there is at least
+ 	 * one lock that needs to be transferred.  If this function is called with
+ 	 * an index OID, we'll first scan to see if there are any predicate locks
+ 	 * on that index.  As soon as we find one we drop down into the update
+ 	 * loop.  If we find none, we release the LW locks and return without
+ 	 * changing anything.
+ 	 *
+ 	 * This optimization comes into play two ways -- a REINDEX might be done
+ 	 * on an index which has no predicate locks, or an operation might be done
+ 	 * which rewrites the entire table and calls REINDEX on each index. In the
+ 	 * latter case the action against the base table will move all the index
+ 	 * locks before any of the index rebuilds are requested.
+ 	 */
+ 	if (isSingleIndex)
+ 	{
+ 		bool		foundIndexLock = false;
+ 
+ 		hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 		while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) != indexId)
+ 				continue;		/* wrong OID for the index */
+ 			if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 				continue;		/* wrong database */
+ 			foundIndexLock = true;
+ 			hash_seq_term(&seqstat);
+ 			break;
+ 		}
+ 		if (!foundIndexLock)
+ 		{
+ 			/* Release locks in reverse order */
+ 			LWLockRelease(SerializableXactHashLock);
+ 			for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 				LWLockRelease(FirstPredicateLockMgrLock + i);
+ 			LWLockRelease(SerializablePredicateLockListLock);
+ 			return;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Remove the reserved entry to give us scratch space, so we know we'll be
+ 	 * able to create the new lock target.
+ 	 */
+ 	reservedtargettaghash = 0;	/* quiet compiler warning */
+ 	if (transfer)
+ 	{
+ 		reservedtargettaghash = PredicateLockTargetTagHashCode(&ReservedTargetTag);
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	/* Scan through PredicateLockHash and copy contents */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *oldpredlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 			continue;			/* wrong database */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) == heapId)
+ 		{
+ 			if (GET_PREDICATELOCKTARGETTAG_TYPE(oldtarget->tag) == PREDLOCKTAG_RELATION)
+ 				continue;		/* already the right lock */
+ 		}
+ 		else
+ 		{
+ 			if (IfIndexGetRelation(GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag)) != heapId)
+ 				continue;		/* not index or index on wrong heap relation */
+ 		}
+ 
+ 		/*
+ 		 * If we made it here, we have work to do.	We make sure the heap
+ 		 * relation lock exists, then we walk the list of predicate locks for
+ 		 * the old target we found, moving all locks to the heap relation lock
+ 		 * -- unless they already hold that.
+ 		 */
+ 
+ 		/*
+ 		 * First make sure we have the heap relation target.  We only need to
+ 		 * do this once.
+ 		 */
+ 		if (transfer && heaptarget == NULL)
+ 		{
+ 			heaptarget = hash_search_with_hash_value(PredicateLockTargetHash,
+ 													 &heaptargettag,
+ 													 heaptargettaghash,
+ 													 HASH_ENTER, &found);
+ 			Assert(heaptarget != NULL);
+ 			if (!found)
+ 				SHMQueueInit(&heaptarget->predicateLocks);
+ 			newpredlocktag.myTarget = heaptarget;
+ 		}
+ 
+ 		/*
+ 		 * Loop through moving locks from this target to the relation target.
+ 		 */
+ 		oldpredlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(oldtarget->predicateLocks),
+ 						 &(oldtarget->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (oldpredlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 			PREDICATELOCK *newpredlock;
+ 			SerCommitSeqNo oldCommitSeqNo = oldpredlock->commitSeqNo;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(oldtarget->predicateLocks),
+ 							 &(oldpredlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 			newpredlocktag.myXact = oldpredlock->tag.myXact;
+ 
+ 			/*
+ 			 * It's OK ot remove the old lock first because of the ACCESS
+ 			 * EXCLUSIVE lock on the heap relation when this is called.  It is
+ 			 * desirable to do so because it avoids any chance of running out
+ 			 * of lock structure entries for the table.
+ 			 */
+ 			SHMQueueDelete(&(oldpredlock->xactLink));
+ 			/* No need for retail delete from oldtarget list. */
+ 			hash_search(PredicateLockHash,
+ 						&oldpredlock->tag,
+ 						HASH_REMOVE, &found);
+ 			Assert(found);
+ 
+ 			if (transfer)
+ 			{
+ 				newpredlock = (PREDICATELOCK *)
+ 					hash_search_with_hash_value
+ 					(PredicateLockHash,
+ 					 &newpredlocktag,
+ 					 PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
+ 														  heaptargettaghash),
+ 					 HASH_ENTER_NULL, &found);
+ 				Assert(newpredlock != NULL);
+ 				if (!found)
+ 				{
+ 					SHMQueueInsertBefore(&(heaptarget->predicateLocks),
+ 										 &(newpredlock->targetLink));
+ 					SHMQueueInsertBefore(&(newpredlocktag.myXact->predicateLocks),
+ 										 &(newpredlock->xactLink));
+ 					newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 				else
+ 				{
+ 					if (newpredlock->commitSeqNo < oldCommitSeqNo)
+ 						newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 
+ 				Assert(newpredlock->commitSeqNo != 0);
+ 				Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
+ 					   || (newpredlock->tag.myXact == OldCommittedSxact));
+ 			}
+ 
+ 			oldpredlock = nextpredlock;
+ 		}
+ 
+ 		hash_search(PredicateLockTargetHash, &oldtarget->tag, HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	if (transfer)
+ 	{
+ 		/* Put the reserved entry back */
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_ENTER, &found);
+ 		Assert(!found);
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ /*
+  * TransferPredicateLocksToHeapRelation
+  *		For all transactions, transfer all predicate locks for the given
+  *		relation to a single relation lock on the heap.  For a heap relation
+  *		that includes all locks on indexes; for an index the same locks moves
+  *		are needed, but only if one or more locks exists on that index.
+  */
+ void
+ TransferPredicateLocksToHeapRelation(const Relation relation)
+ {
+ 	DropAllPredicateLocksFromTableImpl(relation, true);
+ }
+ 
  
  /*
   *		PredicateLockPageSplit
***************
*** 3792,3797 **** CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple,
--- 4100,4201 ----
  }
  
  /*
+  * CheckTableForSerializableConflictIn
+  *		The entire table is going through a DDL-style logical mass write (like
+  *		TRUNCATE TABLE or DROP TABLE.  While these operations do not operate
+  *		entirely within the bounds of snapshot isolation, they can occur
+  *		inside of a serialziable transaction, and will logically occur after
+  *		any reads which saw rows which were destroyed by these operations, so
+  *		we do what we can to serialize properly under SSI.
+  *
+  * The relation passed in must be a heap relation for a table. Any predicate
+  * lock of any granularity on the table or any of its indexes will cause a
+  * rw-conflict in to this transaction.
+  *
+  * This should be done before altering the predicate locks because the
+  * transaction could be rolled back because of a conflict, in which case the
+  * lock changes are not needed.
+  */
+ void
+ CheckTableForSerializableConflictIn(const Relation relation)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *target;
+ 	Oid			dbId;
+ 	Oid			heapId;
+ 	int			i;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
+ 	 * would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSerialization(relation))
+ 		return;
+ 
+ 	Assert(relation->rd_index == NULL);
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	heapId = relation->rd_id;
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_SHARED);
+ 	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ 
+ 	/* Scan through PredicateLockHash and copy contents */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((target = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *predlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(target->tag) != dbId)
+ 			continue;			/* wrong database */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(target->tag) != heapId
+ 			&& IfIndexGetRelation(GET_PREDICATELOCKTARGETTAG_RELATION(target->tag)) != heapId)
+ 			continue;			/* not index or index on wrong heap relation */
+ 
+ 		/*
+ 		 * Loop through locks for this target and flag conflicts.
+ 		 */
+ 		predlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(target->predicateLocks),
+ 						 &(target->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (predlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(target->predicateLocks),
+ 							 &(predlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 
+ 			if (predlock->tag.myXact != MySerializableXact
+ 				&& !RWConflictExists(predlock->tag.myXact, (SERIALIZABLEXACT *) MySerializableXact))
+ 				FlagRWConflict(predlock->tag.myXact, (SERIALIZABLEXACT *) MySerializableXact);
+ 
+ 			predlock = nextpredlock;
+ 		}
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ 
+ /*
   * Flag a rw-dependency between two serializable transactions.
   *
   * The caller is responsible for ensuring that we have a LW lock on
***************
*** 3814,3820 **** FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
  		SetRWConflict(reader, writer);
  }
  
! /*
   * We are about to add a RW-edge to the dependency graph - check that we don't
   * introduce a dangerous structure by doing so, and abort one of the
   * transactions if so.
--- 4218,4224 ----
  		SetRWConflict(reader, writer);
  }
  
! /*----------------------------------------------------------------------------
   * We are about to add a RW-edge to the dependency graph - check that we don't
   * introduce a dangerous structure by doing so, and abort one of the
   * transactions if so.
***************
*** 3823,3835 **** FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
   * in the dependency graph:
   *
   *		Tin ------> Tpivot ------> Tout
!  *	   		  rw			 rw
   *
   * Furthermore, Tout must commit first.
   *
   * One more optimization is that if Tin is declared READ ONLY (or commits
   * without writing), we can only have a problem if Tout committed before Tin
   * acquired its snapshot.
   */
  static void
  OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
--- 4227,4240 ----
   * in the dependency graph:
   *
   *		Tin ------> Tpivot ------> Tout
!  *			  rw			 rw
   *
   * Furthermore, Tout must commit first.
   *
   * One more optimization is that if Tin is declared READ ONLY (or commits
   * without writing), we can only have a problem if Tout committed before Tin
   * acquired its snapshot.
+  *----------------------------------------------------------------------------
   */
  static void
  OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
***************
*** 3842,3873 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  
  	failure = false;
  
! 	/*
  	 * Check for already-committed writer with rw-conflict out flagged
  	 * (conflict-flag on W means that T2 committed before W):
  	 *
  	 *		R ------> W ------> T2
! 	 *			rw        rw
  	 *
  	 * That is a dangerous structure, so we must abort. (Since the writer
  	 * has already committed, we must be the reader)
  	 */
  	if (SxactIsCommitted(writer)
  	  && (SxactHasConflictOut(writer) || SxactHasSummaryConflictOut(writer)))
  		failure = true;
  
! 	/*
  	 * Check whether the writer has become a pivot with an out-conflict
  	 * committed transaction (T2), and T2 committed first:
  	 *
  	 *		R ------> W ------> T2
! 	 *			rw        rw
  	 *
  	 * Because T2 must've committed first, there is no anomaly if:
  	 * - the reader committed before T2
  	 * - the writer committed before T2
  	 * - the reader is a READ ONLY transaction and the reader was concurrent
  	 *	 with T2 (= reader acquired its snapshot before T2 committed)
  	 */
  	if (!failure)
  	{
--- 4247,4280 ----
  
  	failure = false;
  
! 	/*------------------------------------------------------------------------
  	 * Check for already-committed writer with rw-conflict out flagged
  	 * (conflict-flag on W means that T2 committed before W):
  	 *
  	 *		R ------> W ------> T2
! 	 *			rw		  rw
  	 *
  	 * That is a dangerous structure, so we must abort. (Since the writer
  	 * has already committed, we must be the reader)
+ 	 *------------------------------------------------------------------------
  	 */
  	if (SxactIsCommitted(writer)
  	  && (SxactHasConflictOut(writer) || SxactHasSummaryConflictOut(writer)))
  		failure = true;
  
! 	/*------------------------------------------------------------------------
  	 * Check whether the writer has become a pivot with an out-conflict
  	 * committed transaction (T2), and T2 committed first:
  	 *
  	 *		R ------> W ------> T2
! 	 *			rw		  rw
  	 *
  	 * Because T2 must've committed first, there is no anomaly if:
  	 * - the reader committed before T2
  	 * - the writer committed before T2
  	 * - the reader is a READ ONLY transaction and the reader was concurrent
  	 *	 with T2 (= reader acquired its snapshot before T2 committed)
+ 	 *------------------------------------------------------------------------
  	 */
  	if (!failure)
  	{
***************
*** 3891,3897 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  				&& (!SxactIsCommitted(writer)
  					|| t2->commitSeqNo <= writer->commitSeqNo)
  				&& (!SxactIsReadOnly(reader)
! 					|| t2->commitSeqNo <= reader->SeqNo.lastCommitBeforeSnapshot))
  			{
  				failure = true;
  				break;
--- 4298,4304 ----
  				&& (!SxactIsCommitted(writer)
  					|| t2->commitSeqNo <= writer->commitSeqNo)
  				&& (!SxactIsReadOnly(reader)
! 			   || t2->commitSeqNo <= reader->SeqNo.lastCommitBeforeSnapshot))
  			{
  				failure = true;
  				break;
***************
*** 3903,3918 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  		}
  	}
  
! 	/*
  	 * Check whether the reader has become a pivot with a committed writer:
  	 *
  	 *		T0 ------> R ------> W
! 	 *			 rw        rw
  	 *
  	 * Because W must've committed first for an anomaly to occur, there is no
  	 * anomaly if:
  	 * - T0 committed before the writer
  	 * - T0 is READ ONLY, and overlaps the writer
  	 */
  	if (!failure && SxactIsCommitted(writer) && !SxactIsReadOnly(reader))
  	{
--- 4310,4326 ----
  		}
  	}
  
! 	/*------------------------------------------------------------------------
  	 * Check whether the reader has become a pivot with a committed writer:
  	 *
  	 *		T0 ------> R ------> W
! 	 *			 rw		   rw
  	 *
  	 * Because W must've committed first for an anomaly to occur, there is no
  	 * anomaly if:
  	 * - T0 committed before the writer
  	 * - T0 is READ ONLY, and overlaps the writer
+ 	 *------------------------------------------------------------------------
  	 */
  	if (!failure && SxactIsCommitted(writer) && !SxactIsReadOnly(reader))
  	{
***************
*** 3934,3940 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  				&& (!SxactIsCommitted(t0)
  					|| t0->commitSeqNo >= writer->commitSeqNo)
  				&& (!SxactIsReadOnly(t0)
! 					|| t0->SeqNo.lastCommitBeforeSnapshot >= writer->commitSeqNo))
  			{
  				failure = true;
  				break;
--- 4342,4348 ----
  				&& (!SxactIsCommitted(t0)
  					|| t0->commitSeqNo >= writer->commitSeqNo)
  				&& (!SxactIsReadOnly(t0)
! 			   || t0->SeqNo.lastCommitBeforeSnapshot >= writer->commitSeqNo))
  			{
  				failure = true;
  				break;
***************
*** 3950,3957 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  	{
  		/*
  		 * We have to kill a transaction to avoid a possible anomaly from
! 		 * occurring. If the writer is us, we can just ereport() to cause
! 		 * a transaction abort. Otherwise we flag the writer for termination,
  		 * causing it to abort when it tries to commit. However, if the writer
  		 * is a prepared transaction, already prepared, we can't abort it
  		 * anymore, so we have to kill the reader instead.
--- 4358,4365 ----
  	{
  		/*
  		 * We have to kill a transaction to avoid a possible anomaly from
! 		 * occurring. If the writer is us, we can just ereport() to cause a
! 		 * transaction abort. Otherwise we flag the writer for termination,
  		 * causing it to abort when it tries to commit. However, if the writer
  		 * is a prepared transaction, already prepared, we can't abort it
  		 * anymore, so we have to kill the reader instead.
***************
*** 3962,3968 **** OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
  			ereport(ERROR,
  					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  					 errmsg("could not serialize access due to read/write dependencies among transactions"),
! 			errdetail("Cancelled on identification as a pivot, during write."),
  					 errhint("The transaction might succeed if retried.")));
  		}
  		else if (SxactIsPrepared(writer))
--- 4370,4376 ----
  			ereport(ERROR,
  					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  					 errmsg("could not serialize access due to read/write dependencies among transactions"),
! 					 errdetail("Cancelled on identification as a pivot, during write."),
  					 errhint("The transaction might succeed if retried.")));
  		}
  		else if (SxactIsPrepared(writer))
*** a/src/include/storage/predicate.h
--- b/src/include/storage/predicate.h
***************
*** 49,59 **** extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
--- 49,61 ----
  extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
  extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
  extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+ extern void TransferPredicateLocksToHeapRelation(const Relation relation);
  extern void ReleasePredicateLocks(const bool isCommit);
  
  /* conflict detection (may also trigger rollback) */
  extern void CheckForSerializableConflictOut(const bool valid, const Relation relation, const HeapTuple tuple, const Buffer buffer);
  extern void CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple, const Buffer buffer);
+ extern void CheckTableForSerializableConflictIn(const Relation relation);
  
  /* final rollback checking */
  extern void PreCommit_CheckForSerializationFailure(void);
#27Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#25)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On Sun, Jun 05, 2011 at 12:45:41PM -0500, Kevin Grittner wrote:

Is this possible? If a transaction gets its snapshot while OID of N
is assigned to relation X, can that transaction wind up seeing an OID
of N as a reference to relation Y? If not, there aren't any false
positives possible.

This ought to be possible, assuming that the transaction doesn't try to
read (and take an AccessShareLock on) X.

On the other hand, given all the timing constraints you've noted, I
think it's reasonable to ignore the risk of false positives here. What
you're saying is that we might inadvertently roll back a transaction,
if a table gets dropped and a new table created and assigned the same
OID, and there's an uncommitted transaction that started before the
DROP, *and* certain other conditions hold about the reads and timing of
overlapping transactions.

This combination of conditions seems quite unlikely and I have a hard
time getting too worked up about it. Occasional false positives are
already a fact of life when using SSI.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#26)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 06.06.2011 05:13, Kevin Grittner wrote:

"Kevin Grittner" wrote:

Maybe I should submit a patch without added complexity of the
scheduled cleanup and we can discuss that as a separate patch?

Here's a patch which adds the missing support for DDL.

It makes me a bit uncomfortable to do catalog cache lookups while
holding all the lwlocks. We've also already removed the reserved entry
for scratch space while we do that - if a cache lookup errors out, we'll
leave behind quite a mess. I guess it shouldn't fail, but it seems a bit
fragile.

When TransferPredicateLocksToHeapRelation() is called for a heap, do we
really need to transfer all the locks on its indexes too? When a heap is
dropped or rewritten or truncated, surely all its indexes are also
dropped or reindexed or truncated, so you'll get separate Transfer calls
for each index anyway. I think the logic is actually wrong at the
moment: When you reindex a single index,
DropAllPredicateLocksFromTableImpl() will transfer all locks belonging
to any index on the same table, and any finer-granularity locks on the
heap. It would be enough to transfer only locks on the index being
reindexed, so you risk getting some unnecessary false positives. As a
bonus, if you dumb down DropAllPredicateLocksFromTableImpl() to only
transfer locks on the given heap or index, and not any other indexes on
the same table, you won't need IfIndexGetRelation() anymore, making the
issue of catalog cache lookups moot.

Seems weird to call SkipSplitTracking() for heaps. I guess it's doing
the right thing, but all the comments and the name of that refer to indexes.

Cleanup of
predicate locks at commit time for transactions which ran DROP TABLE
or TRUNCATE TABLE can be added as a separate patch. I consider those
to be optimizations which are of dubious benefit, especially compared
to the complexity of the extra code required.

Ok.

In making sure that the new code for this patch was in pgindent
format, I noticed that the ASCII art and bullet lists recently added
to OnConflict_CheckForSerializationFailure() were mangled badly by
pgindent, so I added the dashes to protect those and included a
pgindent form of that function. That should save someone some
trouble sorting things out after the next global pgindent run.

Thanks, committed that part.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#29Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#28)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas wrote:

It makes me a bit uncomfortable to do catalog cache lookups while
holding all the lwlocks. We've also already removed the reserved
entry for scratch space while we do that - if a cache lookup errors
out, we'll leave behind quite a mess. I guess it shouldn't fail,
but it seems a bit fragile.

When TransferPredicateLocksToHeapRelation() is called for a heap,
do we really need to transfer all the locks on its indexes too?
When a heap is dropped or rewritten or truncated, surely all its
indexes are also dropped or reindexed or truncated, so you'll get
separate Transfer calls for each index anyway.

Probably. Will confirm and simplify based on that.

I think the logic is actually wrong at the moment: When you reindex
a single index, DropAllPredicateLocksFromTableImpl() will transfer
all locks belonging to any index on the same table, and any
finer-granularity locks on the heap. It would be enough to transfer
only locks on the index being reindexed, so you risk getting some
unnecessary false positives.

It seemed like a good idea at the time -- a relation lock on the heap
makes any other locks on the heap or any of its indexes redundant.
So it was an attempt at "cleaning house". Since we don't do anything
for an index request unless there is a lock on that index, it
couldn't cause false positives. But this probably fits into the
category of premature optimizations, since the locks can't cause any
difference in when you get a serialization failure -- it's only a
matter of "taking up space". I could revert that.

As a bonus, if you dumb down DropAllPredicateLocksFromTableImpl()
to only transfer locks on the given heap or index, and not any
other indexes on the same table, you won't need
IfIndexGetRelation() anymore, making the issue of catalog cache
lookups moot.

Which really makes it look like simplifying here to avoid the attempt
to clean house is a good idea. If there's a benefit to be had from
it, it should be demonstrated before attempting (in some later
release), the same as any other optimization.

Seems weird to call SkipSplitTracking() for heaps. I guess it's
doing the right thing, but all the comments and the name of that
refer to indexes.

Yeah, I think it's the right thing, but the macro name should
probably be changed. It was originally created to do the right thing
during index split operations and became useful in other cases where
a transaction was doing things to predicate locks for all
transactions.

Most of this is simplifying, plus one search-and-replace in a single
file. I'll try to post a new patch this evening.

Thanks for the review.

-Kevin

#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#29)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" wrote:

Heikki Linnakangas wrote:

I think the logic is actually wrong at the moment: When you
reindex a single index, DropAllPredicateLocksFromTableImpl() will
transfer all locks belonging to any index on the same table, and
any finer-granularity locks on the heap. It would be enough to
transfer only locks on the index being reindexed, so you risk
getting some unnecessary false positives.

It seemed like a good idea at the time -- a relation lock on the
heap makes any other locks on the heap or any of its indexes
redundant. So it was an attempt at "cleaning house". Since we
don't do anything for an index request unless there is a lock on
that index, it couldn't cause false positives. But this probably
fits into the category of premature optimizations, since the locks
can't cause any difference in when you get a serialization failure
-- it's only a matter of "taking up space". I could revert that.

On reflection, Heikki was dead-on right here; I had some fuzzy
thinking going. Just because one transaction has a lock in the index
doesn't mean that all transactions need lock promotion. That still
leaves an opportunity for cleanup, but it's much narrower -- only
locks from transactions which held locks on the reorganized index can
be replaced by the heap relation lock. That one is narrow enough to
be very unlikely to be worthwhile.

As usual, Heikki was right on target. :-)

-Kevin

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#28)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

It makes me a bit uncomfortable to do catalog cache lookups while
holding all the lwlocks. We've also already removed the reserved entry
for scratch space while we do that - if a cache lookup errors out, we'll
leave behind quite a mess. I guess it shouldn't fail, but it seems a bit
fragile.

The above scares the heck out of me. If you don't believe that a
catcache lookup will ever fail, I will contract to break the patch.
You need to rearrange the code so that this is less fragile.

regards, tom lane

#32Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#31)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you don't believe that a catcache lookup will ever fail, I will
contract to break the patch.

As you probably know by now by reaching the end of the thread, this
code is going away based on Heikki's arguments; but for my
understanding, so that I don't make a bad assumption in this area
again, what could cause the following function to throw an exception
if the current process is holding an exclusive lock on the relation
passed in to it? (I could be a heap or an index relation.) It
seemed safe to me, and I can't spot the risk on a scan of the called
functions. What am I missing?

static Oid
IfIndexGetRelation(Oid indexId)
{
HeapTuple tuple;
Form_pg_index index;
Oid result;

tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexId));
if (!HeapTupleIsValid(tuple))
return InvalidOid;

index = (Form_pg_index) GETSTRUCT(tuple);
Assert(index->indexrelid == indexId);

result = index->indrelid;
ReleaseSysCache(tuple);
return result;
}

Thanks for any clues,

-Kevin

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#32)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you don't believe that a catcache lookup will ever fail, I will
contract to break the patch.

As you probably know by now by reaching the end of the thread, this
code is going away based on Heikki's arguments; but for my
understanding, so that I don't make a bad assumption in this area
again, what could cause the following function to throw an exception
if the current process is holding an exclusive lock on the relation
passed in to it? (I could be a heap or an index relation.) It
seemed safe to me, and I can't spot the risk on a scan of the called
functions. What am I missing?

Out-of-memory. Query cancel. The attempted catalog access failing
because it results in a detected deadlock. I could probably think of
several more if I spent ten minutes on it; and that's not even
considering genuine problem conditions such as a corrupted catalog
index, which robustness demands that we not fall over completely for.

You should never, ever assume that an operation as complicated as a
catalog lookup can't fail.

regards, tom lane

#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#33)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

What am I missing?

Out-of-memory. Query cancel. The attempted catalog access
failing because it results in a detected deadlock. I could
probably think of several more if I spent ten minutes on it; and
that's not even considering genuine problem conditions such as a
corrupted catalog index, which robustness demands that we not fall
over completely for.

You should never, ever assume that an operation as complicated as
a catalog lookup can't fail.

Got it. Thanks.

-Kevin

#35Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#28)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

We've also already removed the reserved entry for scratch space

This and Tom's concerns have me wondering if we should bracket the
two sections of code where we use the reserved lock target entry
with HOLD_INTERRUPTS() and RESUME_INTERRUPTS(). In an assert-enable
build we wouldn't really recover from a transaction canceled while
it was checked out (although if that were the only problem, that
could be fixed), but besides that a cancellation while it's checked
out could cause these otherwise-safe functions to throw exceptions
due to a full heap table.

-Kevin

#36Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#35)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 07.06.2011 20:03, Kevin Grittner wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

We've also already removed the reserved entry for scratch space

This and Tom's concerns have me wondering if we should bracket the
two sections of code where we use the reserved lock target entry
with HOLD_INTERRUPTS() and RESUME_INTERRUPTS().

That's not necessary. You're holding a lwlock, which implies that
interrupts are held off already. There's a HOLD_INTERRUPTS() call in
LWLockAcquire and RESUME_INTERRUPTS() in LWLockRelease.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#37Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#28)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

It makes me a bit uncomfortable to do catalog cache lookups while
holding all the lwlocks.

I think I've caught up with the rest of the class on why this isn't
sane in DropAllPredicateLocksFromTableImpl, but I wonder about
CheckTableForSerializableConflictIn. We *do* expect to be throwing
errors in here, and we need some way to tell whether an index is
associated with a particular heap relation. Is the catalog cache
the right way to check that here, or is something else more
appropriate?

-Kevin

#38Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#37)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 07.06.2011 20:42, Kevin Grittner wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

It makes me a bit uncomfortable to do catalog cache lookups while
holding all the lwlocks.

I think I've caught up with the rest of the class on why this isn't
sane in DropAllPredicateLocksFromTableImpl, but I wonder about
CheckTableForSerializableConflictIn. We *do* expect to be throwing
errors in here, and we need some way to tell whether an index is
associated with a particular heap relation. Is the catalog cache
the right way to check that here, or is something else more
appropriate?

Hmm, it's not as dangerous there, as you're not in the middle of
modifying stuff, but it doesn't feel right there either.

Predicate locks on indexes are only needed to lock key ranges, to notice
later insertions into the range, right? For locks on tuples that do
exist, we have locks on the heap. If we're just about to delete every
tuple in the heap, that doesn't need to conflict with any locks on
indexes, because we're deleting, not inserting. So I don't think we need
to care about index locks here at all, only locks on the heap. Am I
missing something?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#37)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

I think I've caught up with the rest of the class on why this isn't
sane in DropAllPredicateLocksFromTableImpl, but I wonder about
CheckTableForSerializableConflictIn. We *do* expect to be throwing
errors in here, and we need some way to tell whether an index is
associated with a particular heap relation. Is the catalog cache
the right way to check that here, or is something else more
appropriate?

Just to answer the question (independently of Heikki's concern about
whether this is needed at all): it depends on the information you have.
If all you have is the index OID, then yeah a catcache lookup in
pg_index is probably the best thing. If you have an open Relation for
the index, you could instead look into its cached copy of its pg_index
row. If you have an open Relation for the table, I'd think that looking
for a match in RelationGetIndexList() would be the cheapest, since more
than likely that information is already cached.

regards, tom lane

#40Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#38)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Predicate locks on indexes are only needed to lock key ranges, to
notice later insertions into the range, right? For locks on tuples
that do exist, we have locks on the heap. If we're just about to
delete every tuple in the heap, that doesn't need to conflict with
any locks on indexes, because we're deleting, not inserting. So I
don't think we need to care about index locks here at all, only
locks on the heap. Am I missing something?

You're right again. My brain must be turning to mush. This
function can also become simpler, and there is now no reason at all
to add catalog cache lookups to predicate.c. I think that leaves me
with all the answers I need to get a new patch out this evening
(U.S. Central Time).

Thanks,

-Kevin

#41Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#39)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just to answer the question (independently of Heikki's concern
about whether this is needed at all): it depends on the
information you have. If all you have is the index OID, then yeah
a catcache lookup in pg_index is probably the best thing. If you
have an open Relation for the index, you could instead look into
its cached copy of its pg_index row. If you have an open Relation
for the table, I'd think that looking for a match in
RelationGetIndexList() would be the cheapest, since more than
likely that information is already cached.

Thanks, I wasn't aware of RelationGetIndexList(). That's a good one
to remember.

The issue here was: given a particular heap relation, going through
a list of locks looking for matches, where the lock targets use
OIDs. So if I really *did* need to check whether an index was
related to that heap relation, it sounds like the catcache approach
would have been right. But as Heikki points out, the predicate
locks on the indexes only need to guard scanned *gaps* against
*insert* -- so a DROP TABLE or TRUNCATE TABLE can just skip looking
at any locks which aren't against the heap relation.

I think maybe I need a vacation -- you know, one where I'm not using
my vacation time to attend a PostgreSQL conference.

Thanks for the overview of what to use when; it takes a while, but I
think I'm gradually coming up to speed on the million lines of code
which comprise PostgreSQL. Tips like that do help.

-Kevin

#42Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#40)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 07.06.2011 21:10, Kevin Grittner wrote:

I think that leaves me
with all the answers I need to get a new patch out this evening
(U.S. Central Time).

Great, I'll review it in my morning (in about 12h)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#43Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#42)
1 attachment(s)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 07.06.2011 21:10, Kevin Grittner wrote:

I think that leaves me with all the answers I need to get a new
patch out this evening (U.S. Central Time).

Great, I'll review it in my morning (in about 12h)

Attached. Passes all the usual regression tests I run, plus light
ad hoc testing. All is working fine as far as this patch itself
goes, although more testing is needed to really call it sound.

If anyone is interested in the differential from version 3 of the
patch, it is the result of these two commits to my local repo:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=018b0fcbeba05317ba7066e552efe9a04e6890d9
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=fc651e2721a601ea806cf6e5d53d0676dfd26dca

During testing I found two annoying things not caused by this patch
which should probably be addressed in 9.1 if feasible, although I
don't think they rise to the level of blockers. More on those in
separate threads.

-Kevin

Attachments:

ssi-ddl-4.patchtext/plain; name=ssi-ddl-4.patchDownload
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 63,68 ****
--- 63,69 ----
  #include "parser/parse_relation.h"
  #include "storage/bufmgr.h"
  #include "storage/freespace.h"
+ #include "storage/predicate.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
***************
*** 1658,1663 **** heap_drop_with_catalog(Oid relid)
--- 1659,1672 ----
  	CheckTableNotInUse(rel, "DROP TABLE");
  
  	/*
+ 	 * This effectively deletes all rows in the table, and may be done in a
+ 	 * serializable transaction.  In that case we must record a rw-conflict in
+ 	 * to this transaction from each transaction holding a predicate lock on
+ 	 * the table.
+ 	 */
+ 	CheckTableForSerializableConflictIn(rel);
+ 
+ 	/*
  	 * Delete pg_foreign_table tuple first.
  	 */
  	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 54,59 ****
--- 54,60 ----
  #include "parser/parser.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
***************
*** 1312,1317 **** index_drop(Oid indexId)
--- 1313,1324 ----
  	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
  
  	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(userIndexRelation);
+ 
+ 	/*
  	 * Schedule physical removal of the files
  	 */
  	RelationDropStorage(userIndexRelation);
***************
*** 2799,2804 **** reindex_index(Oid indexId, bool skip_constraint_checks)
--- 2806,2817 ----
  	 */
  	CheckTableNotInUse(iRel, "REINDEX INDEX");
  
+ 	/*
+ 	 * All predicate locks on the index are about to be made invalid.
+ 	 * Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(iRel);
+ 
  	PG_TRY();
  	{
  		/* Suppress use of the target index while rebuilding it */
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 39,44 ****
--- 39,45 ----
  #include "optimizer/planner.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
***************
*** 385,390 **** cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
--- 386,397 ----
  	if (OidIsValid(indexOid))
  		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
  
+ 	/*
+ 	 * All predicate locks on the table and its indexes are about to be made
+ 	 * invalid.  Promote them to relation locks on the heap.
+ 	 */
+ 	TransferPredicateLocksToHeapRelation(OldHeap);
+ 
  	/* rebuild_relation does all the dirty work */
  	rebuild_relation(OldHeap, indexOid, freeze_min_age, freeze_table_age,
  					 verbose);
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 70,75 ****
--- 70,76 ----
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/lock.h"
+ #include "storage/predicate.h"
  #include "storage/smgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
***************
*** 1040,1045 **** ExecuteTruncate(TruncateStmt *stmt)
--- 1041,1054 ----
  			Oid			toast_relid;
  
  			/*
+ 			 * This effectively deletes all rows in the table, and may be done
+ 			 * in a serializable transaction.  In that case we must record a
+ 			 * rw-conflict in to this transaction from each transaction
+ 			 * holding a predicate lock on the table.
+ 			 */
+ 			CheckTableForSerializableConflictIn(rel);
+ 
+ 			/*
  			 * Need the full transaction-safe pushups.
  			 *
  			 * Create a new empty storage file for the relation, and assign it
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 155,160 ****
--- 155,161 ----
   *							   BlockNumber newblkno);
   *		PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
   *								 BlockNumber newblkno);
+  *		TransferPredicateLocksToHeapRelation(const Relation relation)
   *		ReleasePredicateLocks(bool isCommit)
   *
   * conflict detection (may also trigger rollback)
***************
*** 162,167 ****
--- 163,169 ----
   *										HeapTupleData *tup, Buffer buffer)
   *		CheckForSerializableConflictIn(Relation relation, HeapTupleData *tup,
   *									   Buffer buffer)
+  *		CheckTableForSerializableConflictIn(const Relation relation)
   *
   * final rollback checking
   *		PreCommit_CheckForSerializationFailure(void)
***************
*** 257,266 ****
  #define SxactIsMarkedForDeath(sxact) (((sxact)->flags & SXACT_FLAG_MARKED_FOR_DEATH) != 0)
  
  /*
!  * When a public interface method is called for a split on an index relation,
!  * this is the test to see if we should do a quick return.
   */
! #define SkipSplitTracking(relation) \
  	(((relation)->rd_id < FirstBootstrapObjectId) \
  	|| RelationUsesLocalBuffers(relation))
  
--- 259,269 ----
  #define SxactIsMarkedForDeath(sxact) (((sxact)->flags & SXACT_FLAG_MARKED_FOR_DEATH) != 0)
  
  /*
!  * When a public interface method is called which needs to manipulate locks on
!  * a particular relation regardless of the lock holder, do a quick check to
!  * see if this relation can be skipped.
   */
! #define SkipPredicateLocksForRelation(relation) \
  	(((relation)->rd_id < FirstBootstrapObjectId) \
  	|| RelationUsesLocalBuffers(relation))
  
***************
*** 273,279 ****
  	((!IsolationIsSerializable()) \
  	|| ((MySerializableXact == InvalidSerializableXact)) \
  	|| ReleasePredicateLocksIfROSafe() \
! 	|| SkipSplitTracking(relation))
  
  
  /*
--- 276,282 ----
  	((!IsolationIsSerializable()) \
  	|| ((MySerializableXact == InvalidSerializableXact)) \
  	|| ReleasePredicateLocksIfROSafe() \
! 	|| SkipPredicateLocksForRelation(relation))
  
  
  /*
***************
*** 434,439 **** static bool TransferPredicateLocksToNewTarget(const PREDICATELOCKTARGETTAG oldta
--- 437,443 ----
  								  const PREDICATELOCKTARGETTAG newtargettag,
  								  bool removeOld);
  static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag);
+ static void DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer);
  static void SetNewSxactGlobalXmin(void);
  static bool ReleasePredicateLocksIfROSafe(void);
  static void ClearOldPredicateLocks(void);
***************
*** 2543,2548 **** exit:
--- 2547,2781 ----
  	return !outOfShmem;
  }
  
+ /*
+  * Drop all predicate locks of any granularity from the specified relation,
+  * which can be a heap relation or an index relation.  Optionally acquire a
+  * relation lock on the heap for any transactions with any lock(s) on the
+  * specified relation.
+  *
+  * This requires grabbing a lot of LW locks and scanning the entire lock
+  * target table for matches.  That makes this more expensive than most
+  * predicate lock management functions, but it will only be called for DDL
+  * type commands and there are fast returns when no serializable transactions
+  * are active or the relation is temporary.
+  *
+  * We are not using the TransferPredicateLocksToNewTarget function because
+  * it acquires its own locks on the partitions of the two targets involved,
+  * and we'll already be holding all partition locks.
+  *
+  * We can't throw an error from here, because the call could be from a
+  * transaction which is not serializable.
+  *
+  * NOTE: This is currently only called with transfer set to true, but that may
+  * change.	If we decide to clean up the locks from a table on commit of a
+  * transaction which executed DROP TABLE, the false condition will be useful.
+  */
+ static void
+ DropAllPredicateLocksFromTableImpl(const Relation relation, bool transfer)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *oldtarget;
+ 	PREDICATELOCKTARGET *heaptarget;
+ 	PREDICATELOCKTARGETTAG heaptargettag;
+ 	PREDICATELOCKTAG newpredlocktag;
+ 	Oid			dbId;
+ 	Oid			relId;
+ 	Oid			heapId;
+ 	int			i;
+ 	bool		isIndex;
+ 	bool		found;
+ 	uint32		reservedtargettaghash;
+ 	uint32		heaptargettaghash;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
+ 	 * would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipPredicateLocksForRelation(relation))
+ 		return;
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	relId = relation->rd_id;
+ 	if (relation->rd_index == NULL)
+ 	{
+ 		isIndex = false;
+ 		heapId = relId;
+ 	}
+ 	else
+ 	{
+ 		isIndex = true;
+ 		heapId = relation->rd_index->indrelid;
+ 	}
+ 	Assert(heapId != InvalidOid);
+ 	Assert(transfer || !isIndex);		/* index OID only makes sense with
+ 										 * transfer */
+ 
+ 	SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId);
+ 	heaptargettaghash = PredicateLockTargetTagHashCode(&heaptargettag);
+ 	heaptarget = NULL;			/* Retrieve first time needed, then keep. */
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_EXCLUSIVE);
+ 	LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ 
+ 	/*
+ 	 * Remove the reserved entry to give us scratch space, so we know we'll be
+ 	 * able to create the new lock target.
+ 	 */
+ 	reservedtargettaghash = 0;	/* quiet compiler warning */
+ 	if (transfer)
+ 	{
+ 		reservedtargettaghash = PredicateLockTargetTagHashCode(&ReservedTargetTag);
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	/* Scan through target map */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *oldpredlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) != relId)
+ 			continue;			/* wrong relation id */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
+ 			continue;			/* wrong database id */
+ 		if (transfer && !isIndex
+ 			&& GET_PREDICATELOCKTARGETTAG_TYPE(oldtarget->tag) == PREDLOCKTAG_RELATION)
+ 			continue;			/* already the right lock */
+ 
+ 		/*
+ 		 * If we made it here, we have work to do.	We make sure the heap
+ 		 * relation lock exists, then we walk the list of predicate locks for
+ 		 * the old target we found, moving all locks to the heap relation lock
+ 		 * -- unless they already hold that.
+ 		 */
+ 
+ 		/*
+ 		 * First make sure we have the heap relation target.  We only need to
+ 		 * do this once.
+ 		 */
+ 		if (transfer && heaptarget == NULL)
+ 		{
+ 			heaptarget = hash_search_with_hash_value(PredicateLockTargetHash,
+ 													 &heaptargettag,
+ 													 heaptargettaghash,
+ 													 HASH_ENTER, &found);
+ 			Assert(heaptarget != NULL);
+ 			if (!found)
+ 				SHMQueueInit(&heaptarget->predicateLocks);
+ 			newpredlocktag.myTarget = heaptarget;
+ 		}
+ 
+ 		/*
+ 		 * Loop through moving locks from this target to the relation target.
+ 		 */
+ 		oldpredlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(oldtarget->predicateLocks),
+ 						 &(oldtarget->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (oldpredlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 			PREDICATELOCK *newpredlock;
+ 			SerCommitSeqNo oldCommitSeqNo = oldpredlock->commitSeqNo;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(oldtarget->predicateLocks),
+ 							 &(oldpredlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 			newpredlocktag.myXact = oldpredlock->tag.myXact;
+ 
+ 			/*
+ 			 * It's OK to remove the old lock first because of the ACCESS
+ 			 * EXCLUSIVE lock on the heap relation when this is called.  It is
+ 			 * desirable to do so because it avoids any chance of running out
+ 			 * of lock structure entries for the table.
+ 			 */
+ 			SHMQueueDelete(&(oldpredlock->xactLink));
+ 			/* No need for retail delete from oldtarget list. */
+ 			hash_search(PredicateLockHash,
+ 						&oldpredlock->tag,
+ 						HASH_REMOVE, &found);
+ 			Assert(found);
+ 
+ 			if (transfer)
+ 			{
+ 				newpredlock = (PREDICATELOCK *)
+ 					hash_search_with_hash_value
+ 					(PredicateLockHash,
+ 					 &newpredlocktag,
+ 					 PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
+ 														  heaptargettaghash),
+ 					 HASH_ENTER_NULL, &found);
+ 				Assert(newpredlock != NULL);
+ 				if (!found)
+ 				{
+ 					SHMQueueInsertBefore(&(heaptarget->predicateLocks),
+ 										 &(newpredlock->targetLink));
+ 					SHMQueueInsertBefore(&(newpredlocktag.myXact->predicateLocks),
+ 										 &(newpredlock->xactLink));
+ 					newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 				else
+ 				{
+ 					if (newpredlock->commitSeqNo < oldCommitSeqNo)
+ 						newpredlock->commitSeqNo = oldCommitSeqNo;
+ 				}
+ 
+ 				Assert(newpredlock->commitSeqNo != 0);
+ 				Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
+ 					   || (newpredlock->tag.myXact == OldCommittedSxact));
+ 			}
+ 
+ 			oldpredlock = nextpredlock;
+ 		}
+ 
+ 		hash_search(PredicateLockTargetHash, &oldtarget->tag, HASH_REMOVE, &found);
+ 		Assert(found);
+ 	}
+ 
+ 	if (transfer)
+ 	{
+ 		/* Put the reserved entry back */
+ 		hash_search_with_hash_value(PredicateLockTargetHash,
+ 									&ReservedTargetTag,
+ 									reservedtargettaghash,
+ 									HASH_ENTER, &found);
+ 		Assert(!found);
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ /*
+  * TransferPredicateLocksToHeapRelation
+  *		For all transactions, transfer all predicate locks for the given
+  *		relation to a single relation lock on the heap.
+  */
+ void
+ TransferPredicateLocksToHeapRelation(const Relation relation)
+ {
+ 	DropAllPredicateLocksFromTableImpl(relation, true);
+ }
+ 
  
  /*
   *		PredicateLockPageSplit
***************
*** 2581,2587 **** PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno,
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
  
! 	if (SkipSplitTracking(relation))
  		return;
  
  	Assert(oldblkno != newblkno);
--- 2814,2820 ----
  	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
  		return;
  
! 	if (SkipPredicateLocksForRelation(relation))
  		return;
  
  	Assert(oldblkno != newblkno);
***************
*** 3792,3797 **** CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple,
--- 4025,4129 ----
  }
  
  /*
+  * CheckTableForSerializableConflictIn
+  *		The entire table is going through a DDL-style logical mass delete
+  *		(like TRUNCATE TABLE or DROP TABLE).  While these operations do not
+  *		operate entirely within the bounds of snapshot isolation, they can
+  *		occur inside of a serialziable transaction, and will logically occur
+  *		after any reads which saw rows which were destroyed by these
+  *		operations, so we do what we can to serialize properly under SSI.
+  *
+  * The relation passed in must be a heap relation for a table. Any predicate
+  * lock of any granularity on the heap will cause a rw-conflict in to this
+  * transaction.  Predicate locks on indexes do not matter because they only
+  * exist to guard against conflicting inserts into the index, and this is a
+  * mass *delete*.
+  *
+  * This should be done before altering the predicate locks because the
+  * transaction could be rolled back because of a conflict, in which case the
+  * lock changes are not needed.
+  */
+ void
+ CheckTableForSerializableConflictIn(const Relation relation)
+ {
+ 	HASH_SEQ_STATUS seqstat;
+ 	PREDICATELOCKTARGET *target;
+ 	Oid			dbId;
+ 	Oid			heapId;
+ 	int			i;
+ 
+ 	/*
+ 	 * Bail out quickly if there are no serializable transactions running.
+ 	 * It's safe to check this without taking locks because the caller is
+ 	 * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
+ 	 * would matter here can be acquired while that is held.
+ 	 */
+ 	if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ 		return;
+ 
+ 	if (SkipSerialization(relation))
+ 		return;
+ 
+ 	Assert(relation->rd_index == NULL); /* not an index relation */
+ 
+ 	dbId = relation->rd_node.dbNode;
+ 	heapId = relation->rd_id;
+ 
+ 	LWLockAcquire(SerializablePredicateLockListLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstPredicateLockMgrLock + i, LW_SHARED);
+ 	LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ 
+ 	/* Scan through target list */
+ 	hash_seq_init(&seqstat, PredicateLockTargetHash);
+ 
+ 	while ((target = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
+ 	{
+ 		PREDICATELOCK *predlock;
+ 
+ 		/*
+ 		 * Check whether this is a target which needs attention.
+ 		 */
+ 		if (GET_PREDICATELOCKTARGETTAG_RELATION(target->tag) != heapId)
+ 			continue;			/* wrong relation id */
+ 		if (GET_PREDICATELOCKTARGETTAG_DB(target->tag) != dbId)
+ 			continue;			/* wrong database id */
+ 
+ 		/*
+ 		 * Loop through locks for this target and flag conflicts.
+ 		 */
+ 		predlock = (PREDICATELOCK *)
+ 			SHMQueueNext(&(target->predicateLocks),
+ 						 &(target->predicateLocks),
+ 						 offsetof(PREDICATELOCK, targetLink));
+ 		while (predlock)
+ 		{
+ 			PREDICATELOCK *nextpredlock;
+ 
+ 			nextpredlock = (PREDICATELOCK *)
+ 				SHMQueueNext(&(target->predicateLocks),
+ 							 &(predlock->targetLink),
+ 							 offsetof(PREDICATELOCK, targetLink));
+ 
+ 			if (predlock->tag.myXact != MySerializableXact
+ 				&& !RWConflictExists(predlock->tag.myXact,
+ 									 (SERIALIZABLEXACT *) MySerializableXact))
+ 				FlagRWConflict(predlock->tag.myXact,
+ 							   (SERIALIZABLEXACT *) MySerializableXact);
+ 
+ 			predlock = nextpredlock;
+ 		}
+ 	}
+ 
+ 	/* Release locks in reverse order */
+ 	LWLockRelease(SerializableXactHashLock);
+ 	for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
+ 		LWLockRelease(FirstPredicateLockMgrLock + i);
+ 	LWLockRelease(SerializablePredicateLockListLock);
+ }
+ 
+ 
+ /*
   * Flag a rw-dependency between two serializable transactions.
   *
   * The caller is responsible for ensuring that we have a LW lock on
*** a/src/include/storage/predicate.h
--- b/src/include/storage/predicate.h
***************
*** 49,59 **** extern void PredicateLockPage(const Relation relation, const BlockNumber blkno);
--- 49,61 ----
  extern void PredicateLockTuple(const Relation relation, const HeapTuple tuple);
  extern void PredicateLockPageSplit(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
  extern void PredicateLockPageCombine(const Relation relation, const BlockNumber oldblkno, const BlockNumber newblkno);
+ extern void TransferPredicateLocksToHeapRelation(const Relation relation);
  extern void ReleasePredicateLocks(const bool isCommit);
  
  /* conflict detection (may also trigger rollback) */
  extern void CheckForSerializableConflictOut(const bool valid, const Relation relation, const HeapTuple tuple, const Buffer buffer);
  extern void CheckForSerializableConflictIn(const Relation relation, const HeapTuple tuple, const Buffer buffer);
+ extern void CheckTableForSerializableConflictIn(const Relation relation);
  
  /* final rollback checking */
  extern void PreCommit_CheckForSerializationFailure(void);
#44Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#43)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 08.06.2011 03:16, Kevin Grittner wrote:

+ 			/*
+ 			 * It's OK to remove the old lock first because of the ACCESS
+ 			 * EXCLUSIVE lock on the heap relation when this is called.  It is
+ 			 * desirable to do so because it avoids any chance of running out
+ 			 * of lock structure entries for the table.
+ 			 */

That assumption is wrong, for REINDEX. REINDEX INDEX holds an
AccessExclusive on the index, but only a ShareLock on the heap. The code
is still OK, though. As long as you hold an AccessExclusiveLock on the
index, no-one will care about predicate locks on the index, so we can
remove them before acquiring the lock on the heap. And we hold lwlocks
on all the predicate lock partitions, so all the operations will appear
atomic to any outside observer anyway.

Committed after adjusting that comment. I did a lot of other cosmetic
changes too, please double-check that I didn't screw up anything.

I also made rewriting ALTER TABLE to behave like CLUSTER, by adding a
call to TransferPredicateLocksToHeapRelation() there. I just looked back
at your old email where you listed the different DDL operations, and
notice that we missed VACUUM FULL as well
(http://archives.postgresql.org/message-id/4DBD7E91020000250003D0D6@gw.wicourts.gov).
I'll look into that. ALTER INDEX was also still an open question in that
email.

BTW, truncating the tail of a heap in vacuum probably also should drop
any locks on the truncated part of the heap. Or is it not possible any
such locks to exist, because none of the tuples are visible to anyone
anymore? A comment in lazy_truncate_heap() might be in order.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#44)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

On 08.06.2011 14:18, Heikki Linnakangas wrote:

I just looked back
at your old email where you listed the different DDL operations, and
notice that we missed VACUUM FULL as well
(http://archives.postgresql.org/message-id/4DBD7E91020000250003D0D6@gw.wicourts.gov).
I'll look into that.

Never mind that, VACUUM FULL uses cluster_rel(), so that's covered already.

ALTER INDEX was also still an open question in that email.

None of the variants of ALTER INDEX do anything that affects predicate
locks, so that's OK too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#46Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#44)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

(sorry for repeatedly replying to self. I'll go for a coffee after this...)

On 08.06.2011 14:18, Heikki Linnakangas wrote:

Committed after adjusting that comment. I did a lot of other cosmetic
changes too, please double-check that I didn't screw up anything.

Also, it would be nice to have some regression tests for this. I don't
think any of the existing tests exercise these new functions (except for
the fast-exit, which isn't very interesting).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#46)
Re: SIREAD lock versus ACCESS EXCLUSIVE lock

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

(sorry for repeatedly replying to self. I'll go for a coffee after
this...)

That's so nice of you to try to make me feel better for the serious
brain fade I suffered yesterday. ;-)

On 08.06.2011 14:18, Heikki Linnakangas wrote:

Committed after adjusting that comment. I did a lot of other
cosmetic changes too, please double-check that I didn't screw up
anything.

On a first read-through, all your changes look like improvements to
me. I'll do some testing and give it a closer read.

Also, it would be nice to have some regression tests for this. I
don't think any of the existing tests exercise these new functions
(except for the fast-exit, which isn't very interesting).

I'll see about posting regression tests for this, as well as looking
into the questions in your earlier posts -- particularly about the
heap truncation in vacuum. I'm pretty sure that vacuum can't clean
up anything where we're still holding predicate locks because of
visibility rules, but I'll take another look to be sure. I hope to
do all that today, but I don't want to push to the point where I'm
making dumb mistakes again.

-Kevin