Code paths where LWLock should be released on failure

Started by Michael Paquierover 10 years ago5 messages

michael.paquier@gmail.com

over 10 years ago

1 attachment(s)

Hi all,

After looking at bug #13128, I have been looking at the code around
LWLockAcquire/Release to see if there are similar issues elsewhere.
Here are my findings:

1) SimpleLruReadPage() holds a control lock at entry that will be held
at exit as well. However SlruReportIOError() can report an error,
letting the lock hold. Shouldn't we release the control lock when a
failure happens?

2) The patch attached to #13128 fixes MarkAsPreparing(), but actually
twophase.c also does not release some locks in LockGXact().

3) PreCommit_Notify@async.c should release AsyncQueueLock on failure I
guess because it is called at transaction commit. At least it looks
safer this way.

4) TablespaceCreateDbspace does not release TablespaceCreateLock but
LWLockReleaseAll would do it when aborting its transaction, so no
changes done there (?).

5) In ReplicationSlotCreate@slot.c, I would think that
ReplicationSlotAllocationLock should be released when all the locks
are in use. Similarly, in ReplicationSlotDropAcquired,
ReplicationSlotAllocationLock should be released when !fail_softly.
SaveSlotToPath could also be made safer when a file could not be
created.

6) In dsm.c, dsm_create does not release
DynamicSharedMemoryControlLock when Error'ing when there are too many
shared memory segments.

7) In shmem.c, ShmemInitStruct does not release ShmemIndexLock on OOM.
I guess that's fine in bootstrap mode, still...

8) In lock.c, LockRelease() does not release partitionLock when a
shared lock cannot be found. Similar thing with
FastPathGetRelationLockEntry().

9) In predicate.c, CreatePredicateLock() forgets to release
SerializablePredicateLockListLock and partitionLock in case of an OOM.
There is a similar issue with ReleaseOneSerializableXact(),
CheckForSerializableConflictOut() and
predicatelock_twophase_recover().

10) In relcache.c, RelCacheInitLock is not released in
RelationCacheInitFilePreInvalidate() after unlink() failure.

11) In AlterSystemSetConfigFile(), AutoFileLock should be released on failure.

All those things give the patch attached. Comments are welcome.
Regards,
--
Michael

Attachments:

20150423_lwlocks_failure_release.patchtext/x-patch; charset=US-ASCII; name=20150423_lwlocks_failure_release.patchDownload

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index b85a666..e8d6754 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -350,6 +350,7 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 		gxact = TwoPhaseState->prepXacts[i];
 		if (strcmp(gxact->gid, gid) == 0)
 		{
+			LWLockRelease(TwoPhaseStateLock);
 			ereport(ERROR,
 					(errcode(ERRCODE_DUPLICATE_OBJECT),
 					 errmsg("transaction identifier \"%s\" is already in use",
@@ -359,11 +360,14 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 
 	/* Get a free gxact from the freelist */
 	if (TwoPhaseState->freeGXacts == NULL)
+	{
+		LWLockRelease(TwoPhaseStateLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
 				 errmsg("maximum number of prepared transactions reached"),
 				 errhint("Increase max_prepared_transactions (currently %d).",
 						 max_prepared_xacts)));
+	}
 	gxact = TwoPhaseState->freeGXacts;
 	TwoPhaseState->freeGXacts = gxact->next;
 
@@ -497,16 +501,22 @@ LockGXact(const char *gid, Oid user)
 
 		/* Found it, but has someone else got it locked? */
 		if (gxact->locking_backend != InvalidBackendId)
+		{
+			LWLockRelease(TwoPhaseStateLock);
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("prepared transaction with identifier \"%s\" is busy",
 							gid)));
+		}
 
 		if (user != gxact->owner && !superuser_arg(user))
+		{
+			LWLockRelease(TwoPhaseStateLock);
 			ereport(ERROR,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				  errmsg("permission denied to finish prepared transaction"),
 					 errhint("Must be superuser or the user that prepared the transaction.")));
+		}
 
 		/*
 		 * Note: it probably would be possible to allow committing from
@@ -515,10 +525,13 @@ LockGXact(const char *gid, Oid user)
 		 * someone gets motivated to make it work.
 		 */
 		if (MyDatabaseId != proc->databaseId)
+		{
+			LWLockRelease(TwoPhaseStateLock);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				  errmsg("prepared transaction belongs to another database"),
 					 errhint("Connect to the database where the transaction was prepared to finish it.")));
+		}
 
 		/* OK for me to lock it */
 		gxact->locking_backend = MyBackendId;
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 2826b7e..1224e4d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -839,9 +839,12 @@ PreCommit_Notify(void)
 			LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
 			asyncQueueFillWarning();
 			if (asyncQueueIsFull())
+			{
+				LWLockRelease(AsyncQueueLock);
 				ereport(ERROR,
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 					  errmsg("too many notifications in the NOTIFY queue")));
+			}
 			nextNotify = asyncQueueAddEntries(nextNotify);
 			LWLockRelease(AsyncQueueLock);
 		}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fa1f07b..befd537 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -250,10 +250,13 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 	/* If all slots are in use, we're out of luck. */
 	if (slot == NULL)
+	{
+		LWLockRelease(ReplicationSlotAllocationLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
 				 errmsg("all replication slots are in use"),
 				 errhint("Free one or increase max_replication_slots.")));
+	}
 
 	/*
 	 * Since this slot is not in use, nobody should be looking at any part of
@@ -465,6 +468,9 @@ ReplicationSlotDropAcquired(void)
 		vslot->active_pid = 0;
 		SpinLockRelease(&slot->mutex);
 
+		if (!fail_softly)
+			LWLockRelease(ReplicationSlotAllocationLock);
+
 		ereport(fail_softly ? WARNING : ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\": %m",
@@ -978,6 +984,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 						   S_IRUSR | S_IWUSR);
 	if (fd < 0)
 	{
+		if (elevel >= ERROR)
+			LWLockRelease(slot->io_in_progress_lock);
 		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m",
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 29e46c2..7dec80f 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -511,6 +511,7 @@ dsm_create(Size size, int flags)
 			pfree(seg);
 			return NULL;
 		}
+		LWLockRelease(DynamicSharedMemoryControlLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
 				 errmsg("too many dynamic shared memory segments")));
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 250e312..794386f 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -356,11 +356,14 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
 			Assert(shmemseghdr->index == NULL);
 			structPtr = ShmemAlloc(size);
 			if (structPtr == NULL)
+			{
+				LWLockRelease(ShmemIndexLock);
 				ereport(ERROR,
 						(errcode(ERRCODE_OUT_OF_MEMORY),
 						 errmsg("not enough shared memory for data structure"
 								" \"%s\" (%zu bytes requested)",
 								name, size)));
+			}
 			shmemseghdr->index = structPtr;
 			*foundPtr = FALSE;
 		}
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 1eb2d4b..1946bca 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1874,7 +1874,10 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 													HASH_FIND,
 													NULL);
 		if (!lock)
+		{
+			LWLockRelease(partitionLock);
 			elog(ERROR, "failed to re-find shared lock object");
+		}
 		locallock->lock = lock;
 
 		proclocktag.myLock = lock;
@@ -1884,7 +1887,10 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 													   HASH_FIND,
 													   NULL);
 		if (!locallock->proclock)
+		{
+			LWLockRelease(partitionLock);
 			elog(ERROR, "failed to re-find shared proclock object");
+		}
 	}
 	LOCK_PRINT("LockRelease: found", lock, lockmode);
 	proclock = locallock->proclock;
@@ -2622,7 +2628,10 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 													HASH_FIND,
 													NULL);
 		if (!lock)
+		{
+			LWLockRelease(partitionLock);
 			elog(ERROR, "failed to re-find shared lock object");
+		}
 
 		proclocktag.myLock = lock;
 		proclocktag.myProc = MyProc;
@@ -2635,7 +2644,10 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 										HASH_FIND,
 										NULL);
 		if (!proclock)
+		{
+			LWLockRelease(partitionLock);
 			elog(ERROR, "failed to re-find shared proclock object");
+		}
 		LWLockRelease(partitionLock);
 	}
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index b81ebeb..6f8564c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -2315,10 +2315,14 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 									targettag, targettaghash,
 									HASH_ENTER_NULL, &found);
 	if (!target)
+	{
+		LWLockRelease(partitionLock);
+		LWLockRelease(SerializablePredicateLockListLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
 				 errmsg("out of shared memory"),
 				 errhint("You might need to increase max_pred_locks_per_transaction.")));
+	}
 	if (!found)
 		SHMQueueInit(&(target->predicateLocks));
 
@@ -2330,10 +2334,14 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
 			PredicateLockHashCodeFromTargetHashCode(&locktag, targettaghash),
 									HASH_ENTER_NULL, &found);
 	if (!lock)
+	{
+		LWLockRelease(partitionLock);
+		LWLockRelease(SerializablePredicateLockListLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
 				 errmsg("out of shared memory"),
 				 errhint("You might need to increase max_pred_locks_per_transaction.")));
+	}
 
 	if (!found)
 	{
@@ -3723,10 +3731,15 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 															  targettaghash),
 												   HASH_ENTER_NULL, &found);
 			if (!predlock)
+			{
+				LWLockRelease(partitionLock);
+				LWLockRelease(SerializablePredicateLockListLock);
 				ereport(ERROR,
 						(errcode(ERRCODE_OUT_OF_MEMORY),
 						 errmsg("out of shared memory"),
 						 errhint("You might need to increase max_pred_locks_per_transaction.")));
+			}
+
 			if (found)
 			{
 				Assert(predlock->commitSeqNo != 0);
@@ -3967,19 +3980,25 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
 				&& (!SxactIsReadOnly(MySerializableXact)
 					|| conflictCommitSeqNo
 					<= MySerializableXact->SeqNo.lastCommitBeforeSnapshot))
+			{
+				LWLockRelease(SerializableXactHashLock);
 				ereport(ERROR,
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("could not serialize access due to read/write dependencies among transactions"),
 						 errdetail_internal("Reason code: Canceled on conflict out to old pivot %u.", xid),
 					  errhint("The transaction might succeed if retried.")));
+			}
 
 			if (SxactHasSummaryConflictIn(MySerializableXact)
 				|| !SHMQueueEmpty(&MySerializableXact->inConflicts))
+			{
+				LWLockRelease(SerializableXactHashLock);
 				ereport(ERROR,
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("could not serialize access due to read/write dependencies among transactions"),
 						 errdetail_internal("Reason code: Canceled on identification as a pivot, with conflict out to old committed transaction %u.", xid),
 					  errhint("The transaction might succeed if retried.")));
+			}
 
 			MySerializableXact->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
 		}
@@ -4866,9 +4885,12 @@ predicatelock_twophase_recover(TransactionId xid, uint16 info,
 		LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
 		sxact = CreatePredXact();
 		if (!sxact)
+		{
+			LWLockRelease(SerializableXactHashLock);
 			ereport(ERROR,
 					(errcode(ERRCODE_OUT_OF_MEMORY),
 					 errmsg("out of shared memory")));
+		}
 
 		/* vxid for a prepared xact is InvalidBackendId/xid; no pid */
 		sxact->vxid.backendId = InvalidBackendId;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index e745006..36888fa 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5203,10 +5203,13 @@ RelationCacheInitFilePreInvalidate(void)
 		 * get rid of the would-be-obsolete init file.
 		 */
 		if (errno != ENOENT)
+		{
+			LWLockRelease(RelCacheInitLock);
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not remove cache file \"%s\": %m",
 							initfilename)));
+		}
 	}
 }
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f43aff2..b9867fd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6886,9 +6886,12 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
 
 			infile = AllocateFile(AutoConfFileName, "r");
 			if (infile == NULL)
+			{
+				LWLockRelease(AutoFileLock);
 				ereport(ERROR,
 						(errmsg("could not open file \"%s\": %m",
 								AutoConfFileName)));
+			}
 
 			/* parse it */
 			ParseConfigFp(infile, AutoConfFileName, 0, LOG, &head, &tail);
@@ -6914,10 +6917,13 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
 						  O_CREAT | O_RDWR | O_TRUNC,
 						  S_IRUSR | S_IWUSR);
 	if (Tmpfd < 0)
+	{
+		LWLockRelease(AutoFileLock);
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						AutoConfTmpFileName)));
+	}
 
 	/*
 	 * Use a TRY block to clean up the file if we fail.  Since we need a TRY
@@ -6938,10 +6944,13 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
 		 * command.
 		 */
 		if (rename(AutoConfTmpFileName, AutoConfFileName) < 0)
+		{
+			LWLockRelease(AutoFileLock);
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not rename file \"%s\" to \"%s\": %m",
 							AutoConfTmpFileName, AutoConfFileName)));
+		}
 	}
 	PG_CATCH();
 	{

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Michael Paquier (#1)

Re: Code paths where LWLock should be released on failure

Michael Paquier <michael.paquier@gmail.com> writes:

After looking at bug #13128, I have been looking at the code around
LWLockAcquire/Release to see if there are similar issues elsewhere.
Here are my findings:

IIRC, we automatically release all LWLocks at the start of error recovery,
so I rather doubt that any of this is necessary.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Tom Lane (#2)

Re: Code paths where LWLock should be released on failure

On Thu, Apr 23, 2015 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

After looking at bug #13128, I have been looking at the code around
LWLockAcquire/Release to see if there are similar issues elsewhere.
Here are my findings:

IIRC, we automatically release all LWLocks at the start of error recovery,
so I rather doubt that any of this is necessary.

Perhaps this is purely cosmetic for most those code, but not all... if
you have a look at #13128, not releasing TwoPhaseStateLock causes a
deadlock on startup when max_prepared_transactions does not have
enough slots. I have also been surprised by the inconsistencies
particularly in predicate.c or other places regarding LWLock releases.
Sometimes they are released on failure, sometimes not.
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Michael Paquier (#1)

Re: Code paths where LWLock should be released on failure

Hi,m

On 2015-04-23 13:51:57 +0900, Michael Paquier wrote:

After looking at bug #13128, I have been looking at the code around
LWLockAcquire/Release to see if there are similar issues elsewhere.
Here are my findings:

Afaics all of these should actually be handled by the paths that release
locks upon error.

5) In ReplicationSlotCreate@slot.c, I would think that
ReplicationSlotAllocationLock should be released when all the locks
are in use. Similarly, in ReplicationSlotDropAcquired,
ReplicationSlotAllocationLock should be released when !fail_softly.
SaveSlotToPath could also be made safer when a file could not be
created.

When called via SQL these are released by normal error processing, in
walsender they're released by WalSndErrorCleanup().

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Michael Paquier (#3)

Re: Code paths where LWLock should be released on failure

Michael Paquier <michael.paquier@gmail.com> wrote:

I have also been surprised by the inconsistencies particularly in
predicate.c or other places regarding LWLock releases. Sometimes
they are released on failure, sometimes not.

Those are not needed for correctness; they are a micro-optimization
for SerializableXactHashLock, which is a particularly hot LWLock
when serializable transactions are used. There are a few places
where this is not done for this lock, which look like the places
where changes were made late in the development of the feature.
Most probably we went through at one point and added them before
throwing errors when that lock was held (as an optimization), but
neglected to do that with subsequent changes. Removing the ones
that are there, or adding to the places we could, would not affect
correctness; it might make a very small difference in how quickly a
transaction which is going to cancel due to a serialization failure
gets out of the way of other transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers