LOCK TABLE .. DEFERRABLE

Started by Rod Tayloralmost 10 years ago9 messages

rod.taylor@gmail.com

almost 10 years ago

1 attachment(s)

The intention of this feature is to give the ability to slip into a normal
workload for non-urgent maintenance work. In essence, instead of lock
waiters being in a Queue, DEFERRABLE causes the current lock statement to
always be last. It was discussed at last years pgCon as useful for
replication tools adding/removing triggers. I've also seen more than one
plpgsql loop using subtransactions and LOCK TABLE .. NOWAIT to achieve a
similar effect. IMO, it's much cleaner built in.

If a lock is successfully obtained on one table, but not on all tables, it
releases that lock and will retry to get them as a group in the future.
Since inheritance acts as a group of tables (top + recursive cascade to
children), this implementation is necessary even if only a single table is
specified in the command.

Like various CONCURRENT commands, it waits on a set of transactions which
were found to be blocking it. This puts it into the "waiting" state and
allows isolation testing to work as expected. I started with a simple loop
with a timer (and a GUC) but it didn't feel right without pg_stat_activity
showing the waiting state. statement_timeout is suggested for a time
restriction.

Possibly Ugly stuff:

SetLocktagRelationOid() no longer static inline. Better option? My C foo
isn't all that it should be. Lock Table allows locking shared tables so I
can't just assume MyDatabaseId is sufficient for the lock tag.

Return value InvalidOid in RangeVarGetRelidExtended() can now appear in 2
different situations; relation missing if missing_ok enabled and relation
unlockable if LockWaitPolicy LockWaitNonBlock. No callers currently use
both of these options at this time.

LockTableRecurse() returns the OID of the relation it could not lock in
order to wait on the processes holding those locks. It also keeps a list of
everything it did lock so they can be unlocked if necessary.

I'll add it to the open November commitfest.

regards,

Rod Taylor

Attachments:

lock_table_deferrable.patchtext/x-patch; charset=US-ASCII; name=lock_table_deferrable.patchDownload

diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index b946eab..e852f1d 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [ * ] [, ...] [ IN <replaceable class="PARAMETER">lockmode</replaceable> MODE ] [ NOWAIT ]
+LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [ * ] [, ...] [ IN <replaceable class="PARAMETER">lockmode</replaceable> MODE ] [ NOWAIT | DEFERRABLE ]
 
 <phrase>where <replaceable class="PARAMETER">lockmode</replaceable> is one of:</phrase>
 
@@ -39,7 +39,23 @@ LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [ * ]
    <literal>NOWAIT</literal> is specified, <command>LOCK
    TABLE</command> does not wait to acquire the desired lock: if it
    cannot be acquired immediately, the command is aborted and an
-   error is emitted.  Once obtained, the lock is held for the
+   error is emitted.  
+  </para>
+
+  <para>
+   If <literal>DEFERRABLE</literal> is specified,
+   <command>LOCK TABLE</command> will wait without blocking for the
+   duration of <xref linkend="guc-max-lock-deferrable-wait-time">
+   for all locks to become available. If all locks cannot be obtained
+   simultaneously before the timeout then none of the structures 
+   will be locked and an error is emitted. Since it is non-blocking,
+   other transactions may obtain locks freely and may cause the
+   required wait time to be infinite. Use <varname>statement_timeout</varname>
+   for to restrict the wait time.
+  </para>
+
+  <para>
+   Once obtained, the lock is held for the
    remainder of the current transaction.  (There is no <command>UNLOCK
    TABLE</command> command; locks are always released at transaction
    end.)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..4259072 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4865,6 +4865,9 @@ l3:
 										RelationGetRelationName(relation))));
 
 						break;
+					case LockWaitNonBlock:
+						elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+						break;
 				}
 
 				/*
@@ -4902,6 +4905,9 @@ l3:
 									 errmsg("could not obtain lock on row in relation \"%s\"",
 										RelationGetRelationName(relation))));
 						break;
+					case LockWaitNonBlock:
+						elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+						break;
 				}
 			}
 
@@ -5125,6 +5131,9 @@ heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode,
 					errmsg("could not obtain lock on row in relation \"%s\"",
 						   RelationGetRelationName(relation))));
 			break;
+		case LockWaitNonBlock:
+			elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+			break;
 	}
 	*have_tuple_lock = true;
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..a0c4e56 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -46,6 +46,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "storage/ipc.h"
+#include "storage/lock.h"
 #include "storage/lmgr.h"
 #include "storage/sinval.h"
 #include "utils/acl.h"
@@ -223,14 +224,19 @@ Datum		pg_is_other_temp_schema(PG_FUNCTION_ARGS);
  * If the schema or relation is not found, return InvalidOid if missing_ok
  * = true, otherwise raise an error.
  *
- * If nowait = true, throw an error if we'd have to wait for a lock.
+ * If waitpolicy = LockWaitBlock, throw an error if we'd have to wait
+ * for a lock.
+ *
+ * If waitpolicy = LockWaitNonBlock, return InvalidOid if a lock cannot
+ * be obtained. Callers should not specify both LockWaitNonBlock and
+ * missing_ok.
  *
  * Callback allows caller to check permissions or acquire additional locks
  * prior to grabbing the relation lock.
  */
 Oid
 RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
-						 bool missing_ok, bool nowait,
+						 bool missing_ok, LockWaitPolicy waitpolicy,
 					   RangeVarGetRelidCallback callback, void *callback_arg)
 {
 	uint64		inval_count;
@@ -375,20 +381,28 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
 		 */
 		if (!OidIsValid(relId))
 			AcceptInvalidationMessages();
-		else if (!nowait)
+		else if (waitpolicy == LockWaitBlock)
 			LockRelationOid(relId, lockmode);
 		else if (!ConditionalLockRelationOid(relId, lockmode))
 		{
-			if (relation->schemaname)
-				ereport(ERROR,
-						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-						 errmsg("could not obtain lock on relation \"%s.%s\"",
-								relation->schemaname, relation->relname)));
-			else
-				ereport(ERROR,
+			/*
+               WaitNonBlock mode also sends InvalidOid upstream. Since no caller simultaneously
+			   requires NonBlock and missing_ok modes, this doesn't cause confusion.
+            */
+			if (waitpolicy == LockWaitNonBlock)
+				return InvalidOid;
+			else {
+				if (relation->schemaname)
+					ereport(ERROR,
 						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-						 errmsg("could not obtain lock on relation \"%s\"",
-								relation->relname)));
+							 errmsg("could not obtain lock on relation \"%s.%s\"",
+									relation->schemaname, relation->relname)));
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+							 errmsg("could not obtain lock on relation \"%s\"",
+									relation->relname)));
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 175d1f3..edbd2e4 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -25,7 +25,7 @@
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
-static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait);
+static Oid LockTableRecurse(Oid reloid, LOCKMODE lockmode, LockWaitPolicy waitpolicy, List *locked_oids);
 static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode);
 static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid,
 							 Oid oldrelid, void *arg);
@@ -37,6 +37,7 @@ void
 LockTableCommand(LockStmt *lockstmt)
 {
 	ListCell   *p;
+	bool		all_locked = false;
 
 	/*---------
 	 * During recovery we only accept these variations:
@@ -52,20 +53,83 @@ LockTableCommand(LockStmt *lockstmt)
 	/*
 	 * Iterate over the list and process the named relations one at a time
 	 */
-	foreach(p, lockstmt->relations)
-	{
-		RangeVar   *rv = (RangeVar *) lfirst(p);
-		bool		recurse = interpretInhOption(rv->inhOpt);
-		Oid			reloid;
+	do {
+		List	   *locked_oids = NIL;
+		List	   *failed_oids = NIL;
 
-		reloid = RangeVarGetRelidExtended(rv, lockstmt->mode, false,
-										  lockstmt->nowait,
-										  RangeVarCallbackForLockTable,
-										  (void *) &lockstmt->mode);
+		foreach(p, lockstmt->relations)
+		{
+			RangeVar   *rv = (RangeVar *) lfirst(p);
+			bool		recurse = interpretInhOption(rv->inhOpt);
+			Oid			reloid;
 
-		if (recurse)
-			LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait);
-	}
+			reloid = RangeVarGetRelidExtended(rv, lockstmt->mode, false,
+											  lockstmt->waitpolicy,
+											  RangeVarCallbackForLockTable,
+											  (void *) &lockstmt->mode);
+
+			/*
+			 * InvalidOid indicates the lock was not obtained.
+			 *
+			 * The loop is executed to completion in order to throw errors
+			 * early and setup a large number of targets for WaitForLockersMultiple().
+			 */
+			if (!OidIsValid(reloid)) {
+				reloid = RangeVarGetRelid(rv, NoLock, false);
+				failed_oids = lappend_oid(failed_oids, reloid);
+			}
+			else {
+				locked_oids = lappend_oid(locked_oids, reloid);
+
+				if (recurse) {
+					Oid		failed_oid;
+					failed_oid = LockTableRecurse(reloid, lockstmt->mode, lockstmt->waitpolicy, locked_oids);
+					if (OidIsValid(failed_oid)) {
+						failed_oids = lappend_oid(failed_oids, failed_oid);
+					}
+				}
+			}
+		}
+
+		/*
+		 * If some locks were not obtained carefully unlock all relations, including
+		 * inherited rels, and wait on all locktags discovered.
+		 */
+		if (list_length(failed_oids) == 0)
+			all_locked = true;
+		else {
+			List	   *lock_tags = NIL;
+			ListCell   *volatile cell;
+
+			/* Release lock like UnlockRelationOid() but keep the locktag */
+			foreach(cell, locked_oids) {
+				Oid			reloid = lfirst_oid(cell);
+				LOCKTAG		tag;
+
+				SetLocktagRelationOid(&tag, reloid);
+				LockRelease(&tag, lockstmt->mode, false);
+
+				/* Add unblocked tags to the list for this very cheap recheck */
+				lock_tags = lappend(lock_tags, &tag);
+			}
+
+			/* No locks to release, just make the list of tags */
+			foreach(cell, failed_oids) {
+				Oid			reloid = lfirst_oid(cell);
+				LOCKTAG		tag;
+
+				SetLocktagRelationOid(&tag, reloid);
+				lock_tags = lappend(lock_tags, &tag);
+			}
+
+			WaitForLockersMultiple(lock_tags, lockstmt->mode);
+
+			list_free(lock_tags);
+		}
+
+		list_free(locked_oids);
+		list_free(failed_oids);
+	} while (!all_locked);
 }
 
 /*
@@ -106,9 +170,12 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid,
  * We use find_inheritance_children not find_all_inheritors to avoid taking
  * locks far in advance of checking privileges.  This means we'll visit
  * multiply-inheriting children more than once, but that's no problem.
+ *
+ * In LockWaitNonBlock mode the return value is the Relation Oid which could
+ * not be locked.
  */
-static void
-LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
+static Oid
+LockTableRecurse(Oid reloid, LOCKMODE lockmode, LockWaitPolicy waitpolicy, List *locked_oids)
 {
 	List	   *children;
 	ListCell   *lc;
@@ -117,6 +184,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
 
 	foreach(lc, children)
 	{
+		Oid			lock_result;
 		Oid			childreloid = lfirst_oid(lc);
 		AclResult	aclresult;
 
@@ -132,7 +200,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
 		}
 
 		/* We have enough rights to lock the relation; do so. */
-		if (!nowait)
+		if (waitpolicy == LockWaitBlock)
 			LockRelationOid(childreloid, lockmode);
 		else if (!ConditionalLockRelationOid(childreloid, lockmode))
 		{
@@ -141,10 +209,14 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
 
 			if (!relname)
 				continue;		/* child concurrently dropped, just skip it */
-			ereport(ERROR,
-					(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-					 errmsg("could not obtain lock on relation \"%s\"",
-							relname)));
+
+			if (waitpolicy == LockWaitNonBlock)
+				return childreloid;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+						 errmsg("could not obtain lock on relation \"%s\"",
+								relname)));
 		}
 
 		/*
@@ -158,8 +230,18 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
 			continue;
 		}
 
-		LockTableRecurse(childreloid, lockmode, nowait);
+		/*
+		 * Locked children will be added to the locked_oids list to allow
+		 * caller to unlock obtained locks during waitpolicy LockWaitNonBlock
+		 */
+		lappend_oid(locked_oids, childreloid);
+
+		/* Attempt to lock the child table. Abandon loop if it fails */
+		lock_result = LockTableRecurse(childreloid, lockmode, waitpolicy, locked_oids);
+		if (OidIsValid(lock_result))
+			return lock_result;
 	}
+	return InvalidOid;
 }
 
 /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ac02304..8938828 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2304,6 +2304,9 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
 									 errmsg("could not obtain lock on row in relation \"%s\"",
 										RelationGetRelationName(relation))));
 						break;
+					case LockWaitNonBlock:
+						elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+						break;
 				}
 				continue;		/* loop back to repeat heap_fetch */
 			}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f4e4a91..4873e16 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3959,7 +3959,7 @@ _copyLockStmt(const LockStmt *from)
 
 	COPY_NODE_FIELD(relations);
 	COPY_SCALAR_FIELD(mode);
-	COPY_SCALAR_FIELD(nowait);
+	COPY_SCALAR_FIELD(waitpolicy);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 854c062..64ff060 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1962,7 +1962,7 @@ _equalLockStmt(const LockStmt *a, const LockStmt *b)
 {
 	COMPARE_NODE_FIELD(relations);
 	COMPARE_SCALAR_FIELD(mode);
-	COMPARE_SCALAR_FIELD(nowait);
+	COMPARE_SCALAR_FIELD(waitpolicy);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 1273352..cf0b0a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -289,8 +289,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <ival>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
-				opt_nowait opt_if_exists opt_with_data
-%type <ival>	opt_nowait_or_skip
+				opt_if_exists opt_with_data
+%type <ival>	opt_nowait opt_nowait_or_deferrable opt_nowait_or_skip
 
 %type <list>	OptRoleList AlterOptRoleList
 %type <defelt>	CreateOptRoleElem AlterOptRoleElem
@@ -9712,13 +9712,13 @@ using_clause:
  *
  *****************************************************************************/
 
-LockStmt:	LOCK_P opt_table relation_expr_list opt_lock opt_nowait
+LockStmt:  LOCK_P opt_table relation_expr_list opt_lock opt_nowait_or_deferrable
 				{
 					LockStmt *n = makeNode(LockStmt);
 
 					n->relations = $3;
 					n->mode = $4;
-					n->nowait = $5;
+					n->waitpolicy = $5;
 					$$ = (Node *)n;
 				}
 		;
@@ -9737,8 +9737,14 @@ lock_type:	ACCESS SHARE					{ $$ = AccessShareLock; }
 			| ACCESS EXCLUSIVE				{ $$ = AccessExclusiveLock; }
 		;
 
-opt_nowait:	NOWAIT							{ $$ = TRUE; }
-			| /*EMPTY*/						{ $$ = FALSE; }
+opt_nowait:	NOWAIT							{ $$ = LockWaitError; }
+			| /*EMPTY*/						{ $$ = LockWaitBlock; }
+		;
+
+opt_nowait_or_deferrable:
+			NOWAIT							{ $$ = LockWaitError; }
+			| DEFERRABLE					{ $$ = LockWaitNonBlock; }
+			| /*EMPTY*/						{ $$ = LockWaitBlock; }
 		;
 
 opt_nowait_or_skip:
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 0632fc0..61c11a2 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -82,7 +82,7 @@ RelationInitLockInfo(Relation relation)
  * SetLocktagRelationOid
  *		Set up a locktag for a relation, given only relation OID
  */
-static inline void
+void
 SetLocktagRelationOid(LOCKTAG *tag, Oid relid)
 {
 	Oid			dbid;
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..cf6dc72 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -14,6 +14,7 @@
 #ifndef NAMESPACE_H
 #define NAMESPACE_H
 
+#include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
 #include "storage/lock.h"
 
@@ -51,10 +52,10 @@ typedef void (*RangeVarGetRelidCallback) (const RangeVar *relation, Oid relId,
 										   Oid oldRelId, void *callback_arg);
 
 #define RangeVarGetRelid(relation, lockmode, missing_ok) \
-	RangeVarGetRelidExtended(relation, lockmode, missing_ok, false, NULL, NULL)
+	RangeVarGetRelidExtended(relation, lockmode, missing_ok, LockWaitBlock, NULL, NULL)
 
 extern Oid RangeVarGetRelidExtended(const RangeVar *relation,
-						 LOCKMODE lockmode, bool missing_ok, bool nowait,
+						 LOCKMODE lockmode, bool missing_ok, LockWaitPolicy waitpolicy,
 						 RangeVarGetRelidCallback callback,
 						 void *callback_arg);
 extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
diff --git a/src/include/nodes/lockoptions.h b/src/include/nodes/lockoptions.h
index 1c2f00f..4e61c20 100644
--- a/src/include/nodes/lockoptions.h
+++ b/src/include/nodes/lockoptions.h
@@ -40,7 +40,9 @@ typedef enum LockWaitPolicy
 	/* Skip rows that can't be locked (SKIP LOCKED) */
 	LockWaitSkip,
 	/* Raise an error if a row cannot be locked (NOWAIT) */
-	LockWaitError
+	LockWaitError,
+	/* Non-Blocking wait. Always at the end of the queue. */
+	LockWaitNonBlock
 } LockWaitPolicy;
 
 #endif   /* LOCKOPTIONS_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 8b958b4..03b5212 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2906,7 +2906,7 @@ typedef struct LockStmt
 	NodeTag		type;
 	List	   *relations;		/* relations to lock */
 	int			mode;			/* lock mode */
-	bool		nowait;			/* no wait mode */
+	LockWaitPolicy		waitpolicy;			/* NOWAIT, DEFERRABLE, and blocking (default) options */
 } LockStmt;
 
 /* ----------------------
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 975b6f8..e82b684 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -98,6 +98,9 @@ extern void LockSharedObjectForSession(Oid classid, Oid objid, uint16 objsubid,
 extern void UnlockSharedObjectForSession(Oid classid, Oid objid, uint16 objsubid,
 							 LOCKMODE lockmode);
 
+/* Retrieve a locktag */
+extern void SetLocktagRelationOid(LOCKTAG *tag, Oid relid);
+
 /* Describe a locktag for error messages */
 extern void DescribeLockTag(StringInfo buf, const LOCKTAG *tag);
 
diff --git a/src/test/isolation/expected/lock-deferrable.out b/src/test/isolation/expected/lock-deferrable.out
new file mode 100644
index 0000000..10e13fe
--- /dev/null
+++ b/src/test/isolation/expected/lock-deferrable.out
@@ -0,0 +1,28 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1a s2a s3a s1b s3b s2b
+step s1a: LOCK TABLE foo;
+step s2a: LOCK TABLE foo DEFERRABLE; <waiting ...>
+step s3a: LOCK TABLE foo2; <waiting ...>
+step s1b: COMMIT;
+step s3a: <... completed>
+step s3b: COMMIT;
+step s2a: <... completed>
+step s2b: COMMIT;
+
+starting permutation: s3a s2a s1a s3b s1b s2b
+step s3a: LOCK TABLE foo2;
+step s2a: LOCK TABLE foo DEFERRABLE; <waiting ...>
+step s1a: LOCK TABLE foo; <waiting ...>
+step s3b: COMMIT;
+step s1a: <... completed>
+step s1b: COMMIT;
+step s2a: <... completed>
+step s2b: COMMIT;
+
+starting permutation: s2a s3a s2b s3b
+step s2a: LOCK TABLE foo DEFERRABLE;
+step s3a: LOCK TABLE foo2; <waiting ...>
+step s2b: COMMIT;
+step s3a: <... completed>
+step s3b: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 138a0b7..0434610 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -48,3 +48,4 @@ test: alter-table-3
 test: create-trigger
 test: async-notify
 test: timeouts
+test: lock-deferrable
diff --git a/src/test/isolation/specs/lock-deferrable.spec b/src/test/isolation/specs/lock-deferrable.spec
new file mode 100644
index 0000000..88ecaa9
--- /dev/null
+++ b/src/test/isolation/specs/lock-deferrable.spec
@@ -0,0 +1,39 @@
+# Test NOWAIT when regular row locks can't be acquired.
+
+setup
+{
+  CREATE TABLE foo (
+	id int
+  );
+  CREATE TABLE foo2 () INHERITS (foo);
+}
+
+teardown
+{
+  DROP TABLE foo CASCADE;
+}
+
+session "s1"
+setup		{ BEGIN; }
+step "s1a"	{ LOCK TABLE foo; }
+step "s1b"	{ COMMIT; }
+
+session "s2"
+setup		{ BEGIN; }
+step "s2a"	{ LOCK TABLE foo DEFERRABLE; }
+step "s2b"	{ COMMIT; }
+
+session "s3"
+setup		{ BEGIN; }
+step "s3a"	{ LOCK TABLE foo2; }
+step "s3b"	{ COMMIT; }
+
+
+# LOCK DEFERRABLE should not block other processes
+# but is itself blocked. Locks cascade to child tables.
+permutation "s1a" "s2a" "s3a" "s1b" "s3b" "s2b"
+permutation "s3a" "s2a" "s1a" "s3b" "s1b" "s2b"
+
+# LOCK DEFERRABLE blocks when lock is obtained
+permutation "s2a" "s3a" "s2b" "s3b"
+
diff --git a/src/test/regress/expected/lock.out b/src/test/regress/expected/lock.out
index fd27344..252bd1f 100644
--- a/src/test/regress/expected/lock.out
+++ b/src/test/regress/expected/lock.out
@@ -33,6 +33,19 @@ LOCK TABLE lock_tbl1 IN ACCESS EXCLUSIVE MODE NOWAIT;
 LOCK TABLE lock_view1 IN EXCLUSIVE MODE;   -- Will fail; can't lock a non-table
 ERROR:  "lock_view1" is not a table
 ROLLBACK;
+-- Try using DEFERRABLE along with valid options.
+BEGIN TRANSACTION;
+LOCK TABLE lock_tbl1 IN ACCESS SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ROW SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ROW EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE UPDATE EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE ROW EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ACCESS EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_view1 IN EXCLUSIVE MODE;   -- Will fail; can't lock a non-table
+ERROR:  "lock_view1" is not a table
+ROLLBACK;
 -- Verify that we can lock a table with inheritance children.
 CREATE TABLE lock_tbl2 (b BIGINT) INHERITS (lock_tbl1);
 CREATE TABLE lock_tbl3 () INHERITS (lock_tbl2);
diff --git a/src/test/regress/sql/lock.sql b/src/test/regress/sql/lock.sql
index 567e8bc..82611dd 100644
--- a/src/test/regress/sql/lock.sql
+++ b/src/test/regress/sql/lock.sql
@@ -36,6 +36,19 @@ LOCK TABLE lock_tbl1 IN ACCESS EXCLUSIVE MODE NOWAIT;
 LOCK TABLE lock_view1 IN EXCLUSIVE MODE;   -- Will fail; can't lock a non-table
 ROLLBACK;
 
+-- Try using DEFERRABLE along with valid options.
+BEGIN TRANSACTION;
+LOCK TABLE lock_tbl1 IN ACCESS SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ROW SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ROW EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE UPDATE EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN SHARE ROW EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_tbl1 IN ACCESS EXCLUSIVE MODE DEFERRABLE;
+LOCK TABLE lock_view1 IN EXCLUSIVE MODE;   -- Will fail; can't lock a non-table
+ROLLBACK;
+
 -- Verify that we can lock a table with inheritance children.
 CREATE TABLE lock_tbl2 (b BIGINT) INHERITS (lock_tbl1);
 CREATE TABLE lock_tbl3 () INHERITS (lock_tbl2);

Simon Riggs

simon@2ndQuadrant.com

almost 10 years ago

In reply to: Rod Taylor (#1)

Re: LOCK TABLE .. DEFERRABLE

On 5 April 2016 at 17:43, Rod Taylor <rod.taylor@gmail.com> wrote:

The intention of this feature is to give the ability to slip into a normal
workload for non-urgent maintenance work. In essence, instead of lock
waiters being in a Queue, DEFERRABLE causes the current lock statement to
always be last.

Good idea; this was on my list of things to implement. I was going to call
it WAIT PATIENTLY option.

It was discussed at last years pgCon as useful for replication tools
adding/removing triggers. I've also seen more than one plpgsql loop using
subtransactions and LOCK TABLE .. NOWAIT to achieve a similar effect. IMO,
it's much cleaner built in.

Agreed, but your implementation is essentially just the same looping
concept, which I don't much like.

If a lock is successfully obtained on one table, but not on all tables, it
releases that lock and will retry to get them as a group in the future.
Since inheritance acts as a group of tables (top + recursive cascade to
children), this implementation is necessary even if only a single table is
specified in the command.

I'd prefer to see this as a lock wait mode where it sits in the normal lock
queue BUT other lock requestors are allowed to queue jump past it. That
should be just a few lines changed in the lock conflict checker and some
sleight of hand in the lock queue code.

That way we avoid the busy-wait loop and multiple DEFERRABLE lock waiters
queue up normally.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Simon Riggs (#2)

Re: LOCK TABLE .. DEFERRABLE

On 2016-04-05 18:10:11 +0100, Simon Riggs wrote:

I'd prefer to see this as a lock wait mode where it sits in the normal lock
queue BUT other lock requestors are allowed to queue jump past it. That
should be just a few lines changed in the lock conflict checker and some
sleight of hand in the lock queue code.

+1, although wading into deadlock.c makes one need a shower.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Rod Taylor

rod.taylor@gmail.com

almost 10 years ago

In reply to: Simon Riggs (#2)

Re: LOCK TABLE .. DEFERRABLE

On Tue, Apr 5, 2016 at 1:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

If a lock is successfully obtained on one table, but not on all tables, it

releases that lock and will retry to get them as a group in the future.
Since inheritance acts as a group of tables (top + recursive cascade to
children), this implementation is necessary even if only a single table is
specified in the command.

I'd prefer to see this as a lock wait mode where it sits in the normal
lock queue BUT other lock requestors are allowed to queue jump past it.
That should be just a few lines changed in the lock conflict checker and
some sleight of hand in the lock queue code.

That way we avoid the busy-wait loop and multiple DEFERRABLE lock waiters
queue up normally.

Yeah, that would be better. I can see how to handle a single structure in
that way but I'm not at all certain how to handle multiple tables and
inheritance is multiple tables even with a single command.

X1 inherits from X

There is a long-running task on X1.

Someone requests LOCK TABLE X IN ACCESS EXCLUSIVE MODE WAIT PATIENTLY.
Internally this also grabs X1.

The lock on X might be granted immediately and now blocks all other access
to that table.

There would need be a Locking Group kind of thing so various LockTags are
treated as a single entity to grant them simultaneously. That seems pretty
invasive; at least I don't see anything like that today.

Simon Riggs

simon@2ndquadrant.com

over 9 years ago

In reply to: Rod Taylor (#4)

Re: LOCK TABLE .. DEFERRABLE

On 5 April 2016 at 18:34, Rod Taylor <rod.taylor@gmail.com> wrote:

On Tue, Apr 5, 2016 at 1:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

If a lock is successfully obtained on one table, but not on all tables,
it releases that lock and will retry to get them as a group in the future.
Since inheritance acts as a group of tables (top + recursive cascade to
children), this implementation is necessary even if only a single table is
specified in the command.

I'd prefer to see this as a lock wait mode where it sits in the normal
lock queue BUT other lock requestors are allowed to queue jump past it. That
should be just a few lines changed in the lock conflict checker and some
sleight of hand in the lock queue code.

That way we avoid the busy-wait loop and multiple DEFERRABLE lock waiters
queue up normally.

Yeah, that would be better. I can see how to handle a single structure in
that way but I'm not at all certain how to handle multiple tables and
inheritance is multiple tables even with a single command.

Agreed; neither can I.

X1 inherits from X

There is a long-running task on X1.

Someone requests LOCK TABLE X IN ACCESS EXCLUSIVE MODE WAIT PATIENTLY.
Internally this also grabs X1.

The lock on X might be granted immediately and now blocks all other access
to that table.

There would need be a Locking Group kind of thing so various LockTags are
treated as a single entity to grant them simultaneously. That seems pretty
invasive; at least I don't see anything like that today.

Multiple locktags would likely be behind different LWLocks anyway, so
I don't see a way to make that work.

So the only way to handle multiple locks is to do this roughly the way
Rod suggests.

The only thing I would add at this stage is that if one lock is
unavailable, unlocking all previous locks is unnecessary. We only need
to unlock if there are lock waiters for the locks we already hold.

The use cases I am thinking of require only one table at a time, so
I'm still inclined towards the non-looping approach.

Thoughts?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndquadrant.com

over 9 years ago

In reply to: Simon Riggs (#5)

1 attachment(s)

Re: LOCK TABLE .. DEFERRABLE

On 1 September 2016 at 21:28, Simon Riggs <simon@2ndquadrant.com> wrote:

So the only way to handle multiple locks is to do this roughly the way
Rod suggests.

I've just been reading the VACUUM code and it turns out that we
already use Rod's mechanism internally. So on that basis it seems fine
to support this as a useful user-level feature. If there is a better
way of doing it, then that can be added later.

My proposed changes to this patch are these

1. Rename this WAIT PATIENTLY, which is IMHO a better description of
what is being requested. Bikeshedding welcome.

2. Introduce a new API call LockTablePatiently() that returns bool. So
its usage is similar to ConditionalLockTable(), the only difference is
you supply some other wait parameters with it. This isolates the
internal mechanism from the usage, so allows us to more easily support
any fancy new way of doing this we think of later.

3. Use LockTablePatiently() within lockcmds.c where appropriate

4. Replace the code for waiting in VACUUM with the new call to
LockTablePatiently()

So I see this as 2 patches: 1) new API and make VACUUM use new API, 2)
Rod's LOCK TABLE patch

First patch attached, requires also lock by Oid. If we agree, Rod,
please update your patch to match?

(I pushed this back to next CF, but we can still go ahead if we complete)

Comments?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

patientLockTable.v1.patchapplication/octet-stream; name=patientLockTable.v1.patchDownload

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 231e92d..56b2f1a 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1700,7 +1700,6 @@ lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
 	BlockNumber old_rel_pages = vacrelstats->rel_pages;
 	BlockNumber new_rel_pages;
 	PGRUsage	ru0;
-	int			lock_retry;
 
 	pg_rusage_init(&ru0);
 
@@ -1721,33 +1720,19 @@ lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
 		 * lock).
 		 */
 		vacrelstats->lock_waiter_detected = false;
-		lock_retry = 0;
-		while (true)
+		if (!PatientLockRelation(onerel, AccessExclusiveLock,
+								VACUUM_TRUNCATE_LOCK_TIMEOUT,
+								VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
 		{
-			if (ConditionalLockRelation(onerel, AccessExclusiveLock))
-				break;
-
 			/*
-			 * Check for interrupts while trying to (re-)acquire the exclusive
-			 * lock.
+			 * We failed to establish the lock in the specified number of
+			 * retries. This means we give up truncating.
 			 */
-			CHECK_FOR_INTERRUPTS();
-
-			if (++lock_retry > (VACUUM_TRUNCATE_LOCK_TIMEOUT /
-								VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
-			{
-				/*
-				 * We failed to establish the lock in the specified number of
-				 * retries. This means we give up truncating.
-				 */
-				vacrelstats->lock_waiter_detected = true;
-				ereport(elevel,
-						(errmsg("\"%s\": stopping truncate due to conflicting lock request",
-								RelationGetRelationName(onerel))));
-				return;
-			}
-
-			pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL);
+			vacrelstats->lock_waiter_detected = true;
+			ereport(elevel,
+					(errmsg("\"%s\": stopping truncate due to conflicting lock request",
+							RelationGetRelationName(onerel))));
+			return;
 		}
 
 		/*
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index cbee20e..63caf36 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -248,6 +248,62 @@ ConditionalLockRelation(Relation relation, LOCKMODE lockmode)
 }
 
 /*
+ *		PatientLockRelation
+ *
+ * This is a convenience routine for acquiring a lock while allowing other
+ * locks to pass by. "Patient" might also be taken to mean "low priority".
+ */
+bool
+PatientLockRelation(Relation relation, LOCKMODE lockmode,
+					long lock_timeout, long lock_wait_interval)
+{
+	LOCKTAG		tag;
+	LockAcquireResult res;
+	int			lock_retry = 0;
+	int			num_attempts;
+
+	if (lock_wait_interval <= 0)
+		elog(ERROR, "invalid lock wait interval");
+
+	if (lock_timeout < 0)
+		elog(ERROR, "invalid lock timeout");
+
+	num_attempts = (int) lock_timeout / lock_wait_interval;
+	if (num_attempts < 1)
+		num_attempts = 1;
+
+	SET_LOCKTAG_RELATION(tag,
+						 relation->rd_lockInfo.lockRelId.dbId,
+						 relation->rd_lockInfo.lockRelId.relId);
+
+	while (lock_retry++ < num_attempts)
+	{
+		res = LockAcquire(&tag, lockmode, false, true);
+
+		if (res != LOCKACQUIRE_NOT_AVAIL)
+		{
+			/*
+			 * Now that we have the lock, check for invalidation messages; see notes
+			 * in LockRelationOid.
+			 */
+			if (res != LOCKACQUIRE_ALREADY_HELD)
+				AcceptInvalidationMessages();
+
+			return true;
+		}
+
+		/*
+		 * Check for interrupts while trying to (re-)acquire the lock.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		pg_usleep(lock_wait_interval);
+	}
+
+	return false;
+}
+
+/*
  *		UnlockRelation
  *
  * This is a convenience routine for unlocking a relation without also
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 8288e7d..4b66c10 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -44,6 +44,8 @@ extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode);
 
 extern void LockRelation(Relation relation, LOCKMODE lockmode);
 extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
+extern bool PatientLockRelation(Relation relation, LOCKMODE lockmode,
+								long lock_timeout, long lock_wait_interval);
 extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
 extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Simon Riggs (#6)

Re: LOCK TABLE .. DEFERRABLE

On Tue, Sep 6, 2016 at 6:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 1 September 2016 at 21:28, Simon Riggs <simon@2ndquadrant.com> wrote:

So the only way to handle multiple locks is to do this roughly the way
Rod suggests.

I've just been reading the VACUUM code and it turns out that we
already use Rod's mechanism internally. So on that basis it seems fine
to support this as a useful user-level feature. If there is a better
way of doing it, then that can be added later.

My proposed changes to this patch are these

1. Rename this WAIT PATIENTLY, which is IMHO a better description of
what is being requested. Bikeshedding welcome.

2. Introduce a new API call LockTablePatiently() that returns bool. So
its usage is similar to ConditionalLockTable(), the only difference is
you supply some other wait parameters with it. This isolates the
internal mechanism from the usage, so allows us to more easily support
any fancy new way of doing this we think of later.

3. Use LockTablePatiently() within lockcmds.c where appropriate

4. Replace the code for waiting in VACUUM with the new call to
LockTablePatiently()

So I see this as 2 patches: 1) new API and make VACUUM use new API, 2)
Rod's LOCK TABLE patch

First patch attached, requires also lock by Oid. If we agree, Rod,
please update your patch to match?

Aside from the fact that polling is generally inefficient and wasteful
of system resources, this allows for undetected deadlocks. Consider:

S1: LOCK TABLE A;
S2: LOCK TABLE B;
S1: LOCK TABLE B; -- blocks
S2: LOCK TABLE A PATIENTLY; -- retries forever

Livelock might be possible, too.

I think it would be better to think harder about what would be
required to implement this natively in the lock manager. Suppose we
add a flag to each PROCLOCK which, if true, indicates that the lock
request is low priority. Also, we add a counter to each LOCK
indicating the number of pending low-priority lock requests. When
LockAcquireExtended reaches this code here...

if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
status = STATUS_FOUND;
else
status = LockCheckConflicts(lockMethodTable, lockmode,

lock, proclock);

...we add an additional check to the upper branch: if the number of
low-priority waiters is not 0, then we walk the wait queue; if all
waiters that conflict with the requested lock mode are low-priority,
then we set status = STATUS_OK. So, new lock requests refuse to queue
behind low-priority lock waiters.

Is that good enough to implement the requested behavior, or do we need
to do more? If we only do what's described above, then a
normal-priority waiter which joins the queue after a low-priority
waiter is already waiting will let the low-priority waiter go first.
That's probably not desirable, but it's pretty easy to fix: the logic
that determines where a new waiter enters the wait queue is in
ProcSleep(), and it wouldn't be too hard to arrange for new
normal-priority waiters to skip over any low-priority waiters that are
at the end of the existing wait queue (but not any that are in the
middle, because if we did that we'd also be skipping over
normal-priority waiters, which we shouldn't).

What more? Well, even after doing those two things, it's still
possible for either the simple deadlock logic in ProcSleep() or the
full deadlock detector to put a low-priority waiter in front of a
normal-priority waiter. However, our typical policy is to advance
waiters in the wait queue as little as possible. In other words, if
the wait queue contains A B C and we will deadlock unless C is moved
up, we will move it ahead of B but not A if that is sufficient to
avoid the deadlock. We will only move it ahead of both B and A if
that is necessary to avoid deadlock. So, low-priority requests won't
be moved up further than needed, which is good.

Still, it is possible to construct scenarios where we won't get
perfect low-priority behavior without more invasive changes. For
example, suppose we make a low-priority request queue-jump over an
intervening waiter to avoid deadlocking against it. Next, a
normal-priority waiter enters the queue. Then, the guy we skipped
aborts. At this point, we could in theory notice that it's possible
to move the low-priority request behind the new normal-priority
waiter. However, I think we shouldn't do that. We certainly can't do
it unconditionally because it might introduce deadlocks. We could
test whether it will introduce a deadlock and do it only if not, but
that's expensive. Instead, I think we should document that a
low-priority request will ordinarily cause the request to be satisfied
only after all conflicting normal-priority lock requests, but that
this is not guaranteed in the case where the wait queue is rearranged
to avoid deadlock. I don't think that limitation ought to be a
tremendous problem for users, and the alternatives are pretty
unappealing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Robert Haas (#7)

Re: LOCK TABLE .. DEFERRABLE

On Fri, Sep 16, 2016 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 6, 2016 at 6:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 1 September 2016 at 21:28, Simon Riggs <simon@2ndquadrant.com> wrote:

So the only way to handle multiple locks is to do this roughly the way
Rod suggests.

I've just been reading the VACUUM code and it turns out that we
already use Rod's mechanism internally. So on that basis it seems fine
to support this as a useful user-level feature. If there is a better
way of doing it, then that can be added later.

My proposed changes to this patch are these

1. Rename this WAIT PATIENTLY, which is IMHO a better description of
what is being requested. Bikeshedding welcome.

2. Introduce a new API call LockTablePatiently() that returns bool. So
its usage is similar to ConditionalLockTable(), the only difference is
you supply some other wait parameters with it. This isolates the
internal mechanism from the usage, so allows us to more easily support
any fancy new way of doing this we think of later.

3. Use LockTablePatiently() within lockcmds.c where appropriate

4. Replace the code for waiting in VACUUM with the new call to
LockTablePatiently()

So I see this as 2 patches: 1) new API and make VACUUM use new API, 2)
Rod's LOCK TABLE patch

First patch attached, requires also lock by Oid. If we agree, Rod,
please update your patch to match?

Aside from the fact that polling is generally inefficient and wasteful
of system resources, this allows for undetected deadlocks. Consider:

S1: LOCK TABLE A;
S2: LOCK TABLE B;
S1: LOCK TABLE B; -- blocks
S2: LOCK TABLE A PATIENTLY; -- retries forever

Livelock might be possible, too.

I think it would be better to think harder about what would be
required to implement this natively in the lock manager. Suppose we
add a flag to each PROCLOCK which, if true, indicates that the lock
request is low priority. Also, we add a counter to each LOCK
indicating the number of pending low-priority lock requests. When
LockAcquireExtended reaches this code here...

if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
status = STATUS_FOUND;
else
status = LockCheckConflicts(lockMethodTable, lockmode,

lock, proclock);

...we add an additional check to the upper branch: if the number of
low-priority waiters is not 0, then we walk the wait queue; if all
waiters that conflict with the requested lock mode are low-priority,
then we set status = STATUS_OK. So, new lock requests refuse to queue
behind low-priority lock waiters.

Is that good enough to implement the requested behavior, or do we need
to do more? If we only do what's described above, then a
normal-priority waiter which joins the queue after a low-priority
waiter is already waiting will let the low-priority waiter go first.
That's probably not desirable, but it's pretty easy to fix: the logic
that determines where a new waiter enters the wait queue is in
ProcSleep(), and it wouldn't be too hard to arrange for new
normal-priority waiters to skip over any low-priority waiters that are
at the end of the existing wait queue (but not any that are in the
middle, because if we did that we'd also be skipping over
normal-priority waiters, which we shouldn't).

What more? Well, even after doing those two things, it's still
possible for either the simple deadlock logic in ProcSleep() or the
full deadlock detector to put a low-priority waiter in front of a
normal-priority waiter. However, our typical policy is to advance
waiters in the wait queue as little as possible. In other words, if
the wait queue contains A B C and we will deadlock unless C is moved
up, we will move it ahead of B but not A if that is sufficient to
avoid the deadlock. We will only move it ahead of both B and A if
that is necessary to avoid deadlock. So, low-priority requests won't
be moved up further than needed, which is good.

Still, it is possible to construct scenarios where we won't get
perfect low-priority behavior without more invasive changes. For
example, suppose we make a low-priority request queue-jump over an
intervening waiter to avoid deadlocking against it. Next, a
normal-priority waiter enters the queue. Then, the guy we skipped
aborts. At this point, we could in theory notice that it's possible
to move the low-priority request behind the new normal-priority
waiter. However, I think we shouldn't do that. We certainly can't do
it unconditionally because it might introduce deadlocks. We could
test whether it will introduce a deadlock and do it only if not, but
that's expensive. Instead, I think we should document that a
low-priority request will ordinarily cause the request to be satisfied
only after all conflicting normal-priority lock requests, but that
this is not guaranteed in the case where the wait queue is rearranged
to avoid deadlock. I don't think that limitation ought to be a
tremendous problem for users, and the alternatives are pretty
unappealing.

Closed in 2016-11 commitfest with "returned with feedback" status.
Please feel free to update the status once you submit the update patch
that solves all the discussed problems.

Regards,
Hari Babu
Fujitsu Australia

Simon Riggs

simon@2ndquadrant.com

about 9 years ago

In reply to: Robert Haas (#7)

Re: LOCK TABLE .. DEFERRABLE

On 15 September 2016 at 18:51, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 6, 2016 at 6:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 1 September 2016 at 21:28, Simon Riggs <simon@2ndquadrant.com> wrote:

So the only way to handle multiple locks is to do this roughly the way
Rod suggests.

I've just been reading the VACUUM code and it turns out that we
already use Rod's mechanism internally. So on that basis it seems fine
to support this as a useful user-level feature. If there is a better
way of doing it, then that can be added later.

My proposed changes to this patch are these

1. Rename this WAIT PATIENTLY, which is IMHO a better description of
what is being requested. Bikeshedding welcome.

2. Introduce a new API call LockTablePatiently() that returns bool. So
its usage is similar to ConditionalLockTable(), the only difference is
you supply some other wait parameters with it. This isolates the
internal mechanism from the usage, so allows us to more easily support
any fancy new way of doing this we think of later.

3. Use LockTablePatiently() within lockcmds.c where appropriate

4. Replace the code for waiting in VACUUM with the new call to
LockTablePatiently()

So I see this as 2 patches: 1) new API and make VACUUM use new API, 2)
Rod's LOCK TABLE patch

First patch attached, requires also lock by Oid. If we agree, Rod,
please update your patch to match?

Aside from the fact that polling is generally inefficient and wasteful
of system resources, this allows for undetected deadlocks. Consider:

S1: LOCK TABLE A;
S2: LOCK TABLE B;
S1: LOCK TABLE B; -- blocks
S2: LOCK TABLE A PATIENTLY; -- retries forever

Hmmm

Livelock might be possible, too.

I think it would be better to think harder about what would be
required to implement this natively in the lock manager. Suppose we
add a flag to each PROCLOCK which, if true, indicates that the lock
request is low priority. Also, we add a counter to each LOCK
indicating the number of pending low-priority lock requests. When
LockAcquireExtended reaches this code here...

if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
status = STATUS_FOUND;
else
status = LockCheckConflicts(lockMethodTable, lockmode,

lock, proclock);

...we add an additional check to the upper branch: if the number of
low-priority waiters is not 0, then we walk the wait queue; if all
waiters that conflict with the requested lock mode are low-priority,
then we set status = STATUS_OK. So, new lock requests refuse to queue
behind low-priority lock waiters.

Well, that's pretty much the exact design I mentioned earlier.

Is that good enough to implement the requested behavior, or do we need
to do more?

The only problem is that Rod's request was to be able to lock multiple
tables in one statement, which cannot then be done that way.

But there are problems with Rod's approach, so I suggest we ditch that
now and I'll implement the single table lock approach that you, me and
Andres preferred.

I'd rather have something sweet with one table than something crappy
with multiple tables.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers