CIC and deadlocks

Started by Pavan Deolaseealmost 19 years ago14 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com

Isn't CREATE INDEX CONCURRENTLY prone deadlock conditions ?
I saw one with VACUUM today. But I think it can happen with other
commands like VACUUM FULL, CLUSTER, CREATE INDEX
CONCURRENTLY and so on. These commands conflict on the
ShareUpdateExclusiveLock held by CIC and hence would wait for
CIC to release the lock. At the same time, CIC would wait for these
transactions to complete.

We know that these commands are run in a separate transaction
and do not do any index fetches or inserts/updates. So in principle
CIC need not wait for these transactions to complete in any
of its waits. May be we can skip waits on the transactions that
are running one of these commands.

Is it something worth doing ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#2Simon Riggs
simon@2ndquadrant.com
In reply to: Pavan Deolasee (#1)
Re: CIC and deadlocks

On Sat, 2007-03-31 at 17:45 +0530, Pavan Deolasee wrote:

Isn't CREATE INDEX CONCURRENTLY prone deadlock conditions ?
I saw one with VACUUM today. But I think it can happen with other
commands like VACUUM FULL, CLUSTER, CREATE INDEX
CONCURRENTLY and so on. These commands conflict on the
ShareUpdateExclusiveLock held by CIC and hence would wait for
CIC to release the lock. At the same time, CIC would wait for these
transactions to complete.

We know that these commands are run in a separate transaction
and do not do any index fetches or inserts/updates. So in principle
CIC need not wait for these transactions to complete in any
of its waits. May be we can skip waits on the transactions that
are running one of these commands.

Yes, because I proposed it already. :-)

"utility transactions" in - Latest plans for Utilities with HOT

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#1)
Re: CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Isn't CREATE INDEX CONCURRENTLY prone deadlock conditions ?

Can you give a specific example? The deadlock code will grant locks
out-of-order in cases where the alternative is to abort somebody.
I think you may be describing a missed opportunity in that logic,
more than a reason to add still another fragile assumption for HOT.

regards, tom lane

#4Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#3)
Re: CIC and deadlocks

On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Isn't CREATE INDEX CONCURRENTLY prone deadlock conditions ?

Can you give a specific example?

txn1 - CREATE INDEX CONCURRENTLY (takes ShareUpdateExclusiveLock)
txn2 - VACUUM ANALYZE (waits on ShareUpdateExclusiveLock)
tnx1 - waits for txn2 to complete in the second phase of CIC

deadlock!

Lazy VACUUM is safe because we don't include "inVacuum" transactions
in the snapshot and hence don't wait for it in CIC. I haven't checked, but
VACUUM FULL would also deadlock.

I think you may be describing a missed opportunity in that logic,
more than a reason to add still another fragile assumption for HOT.

Not sure what you are referring to. But I shall keep this in mind.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#4)
Re: CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Can you give a specific example?

txn1 - CREATE INDEX CONCURRENTLY (takes ShareUpdateExclusiveLock)
txn2 - VACUUM ANALYZE (waits on ShareUpdateExclusiveLock)
tnx1 - waits for txn2 to complete in the second phase of CIC

Oh, it's the cleanup wait you're worried about.

Lazy VACUUM is safe because we don't include "inVacuum" transactions
in the snapshot and hence don't wait for it in CIC.

Hmm ... only if it's already set inVacuum true ... there's a window
where it has not.

I wonder whether we could change CIC so that the "reference
snapshot" lists only transactions that are running and have already
determined their serializable snapshot (ie, have set proc->xmin).
Xacts that haven't yet done that can be ignored because they couldn't
possibly see the dead tuples we're worried about, no?

Then we could rearrange the order of operations in vacuum_rel so
that we lock the target rel before we acquire a snapshot. Then
a vacuum waiting for the CIC cannot cause a deadlock.

Multi-rel CLUSTER could be changed the same way. I'm not particularly
worried about single-rel CLUSTER, only stuff that would be reasonable
to launch from background maintenance tasks.

[ thinks... ] Actually, it seems risky to omit xids from the reference
snapshot; that could perhaps screw up the index insertions. But we
could look in the procArray to see if the xid still exists and has set
an xmin before we actually wait for it.

regards, tom lane

#6Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#5)
Re: CIC and deadlocks

On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... only if it's already set inVacuum true ... there's a window
where it has not.

I wonder whether we could change CIC so that the "reference
snapshot" lists only transactions that are running and have already
determined their serializable snapshot (ie, have set proc->xmin).
Xacts that haven't yet done that can be ignored because they couldn't
possibly see the dead tuples we're worried about, no?

Yes, it may work. Do we need to take some extra care because
proc-xmin is set while holding SHARED lock on proc array ?

Then we could rearrange the order of operations in vacuum_rel so

that we lock the target rel before we acquire a snapshot. Then
a vacuum waiting for the CIC cannot cause a deadlock.

We may need to do the same in analyze_rel.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#6)
Re: CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Yes, it may work. Do we need to take some extra care because
proc-xmin is set while holding SHARED lock on proc array ?

Good point. I'm envisioning a procarray.c function along the
lines of
bool TransactionHasSnapshot(xid)
which returns true if the xid is currently listed in PGPROC
and has a nonzero xmin. CIC's cleanup wait loop would check
this and ignore the xid if it returns false. Your point means
that this function would have to take exclusive not shared lock
while scanning the procarray, which is kind of annoying, but
it seems not fatal since CIC isn't done all that frequently.

regards, tom lane

#8Pavan Deolasee
pavan.deolasee@enterprisedb.com
In reply to: Tom Lane (#7)
Re: CIC and deadlocks

Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Good point. I'm envisioning a procarray.c function along the
lines of
bool TransactionHasSnapshot(xid)
which returns true if the xid is currently listed in PGPROC
and has a nonzero xmin. CIC's cleanup wait loop would check
this and ignore the xid if it returns false. Your point means
that this function would have to take exclusive not shared lock
while scanning the procarray, which is kind of annoying, but
it seems not fatal since CIC isn't done all that frequently.

Tom,

If you haven't finished this yet, would you like me to work
on this ? If I do it, I would mostly follow the path you
suggested above, unless I run into something else.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#8)
Re: CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

If you haven't finished this yet, would you like me to work
on this ? If I do it, I would mostly follow the path you
suggested above, unless I run into something else.

I'm not intending to work on it.

regards, tom lane

#10Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#7)
1 attachment(s)
Re: [HACKERS] CIC and deadlocks

On 4/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Good point. I'm envisioning a procarray.c function along the
lines of
bool TransactionHasSnapshot(xid)
which returns true if the xid is currently listed in PGPROC
and has a nonzero xmin. CIC's cleanup wait loop would check
this and ignore the xid if it returns false. Your point means
that this function would have to take exclusive not shared lock
while scanning the procarray, which is kind of annoying, but
it seems not fatal since CIC isn't done all that frequently.

When I looked at the code, it occurred to me that possibly we are
OK with just taking shared lock on the procarray. That means that
some other transaction can concurrently set its serializable snapshot
while we are scanning the procarray. But that should not harm us:
if we see the snapshot set, we wait for the transaction. A transaction
which is setting its serializable snapshot NOW, can not see the
tuples that we did not index, isn't it ?

A patch based on the discussion is attached.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Attachments:

CIC_deadlock.patchapplication/octet-stream; name=CIC_deadlock.patchDownload
? GNUmakefile
? config.log
? config.status
? core.7509
? cscope.files
? cscope.out
? install
? contrib/pgbench/pgbench
? src/Makefile.global
? src/backend/postgres
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/libeuc_jis_2004_and_shift_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_win1250.so.0.0
? src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_ascii.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/libutf8_and_cyrillic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/libutf8_and_euc_cn.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/libutf8_and_euc_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/libutf8_and_euc_jp.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/libutf8_and_euc_kr.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/libutf8_and_euc_tw.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/libutf8_and_gb18030.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/libutf8_and_gbk.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/libutf8_and_iso8859.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/libutf8_and_iso8859_1.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_johab/libutf8_and_johab.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/libutf8_and_shift_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/libutf8_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/libutf8_and_uhc.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_win/libutf8_and_win.so.0.0
? src/bin/initdb/initdb
? src/bin/ipcclean/ipcclean
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/psql
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/libecpg_compat.so.2.3
? src/interfaces/ecpg/ecpglib/libecpg.so.5.3
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.2.3
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.1
? src/pl/plpgsql/src/libplpgsql.so.1.0
? src/port/pg_config_paths.h
? src/test/regress/libregress.so.0.0
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/zic
Index: src/backend/commands/analyze.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.104
diff -c -r1.104 analyze.c
*** src/backend/commands/analyze.c	6 Apr 2007 04:21:42 -0000	1.104
--- src/backend/commands/analyze.c	11 Apr 2007 06:10:12 -0000
***************
*** 139,144 ****
--- 139,148 ----
  	if (!onerel)
  		return;
  
+ 	/* set the snapshot if not already done so */
+ 	if (ActiveSnapshot == NULL)
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 	
  	/*
  	 * Check permissions --- this should match vacuum's check!
  	 */
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.159
diff -c -r1.159 cluster.c
*** src/backend/commands/cluster.c	8 Apr 2007 01:26:28 -0000	1.159
--- src/backend/commands/cluster.c	11 Apr 2007 06:10:12 -0000
***************
*** 205,212 ****
  
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
! 			/* functions in indexes may want a snapshot set */
! 			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
--- 205,215 ----
  
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
! 			/* 
! 			 * functions in indexes may want a snapshot set, but we don't
! 			 * set it until the relation is locked succesfully. This helps
! 			 * us avoid certain deadlocks with CREATE INDEX CONCURRENTLY
! 			 */
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
***************
*** 253,258 ****
--- 256,264 ----
  	if (!OldHeap)
  		return;
  
+ 	if (ActiveSnapshot == NULL)
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 
  	/*
  	 * Since we may open a new transaction for each relation, we have to check
  	 * that the relation still is what we think it is.
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c	13 Mar 2007 00:33:39 -0000	1.157
--- src/backend/commands/indexcmds.c	11 Apr 2007 06:10:12 -0000
***************
*** 512,518 ****
  	 * need not check for that.
  	 */
  	for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++)
! 		XactLockTableWait(snapshot->xip[ixcnt]);
  
  	/* Index can now be marked valid -- update its pg_index entry */
  	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
--- 512,533 ----
  	 * need not check for that.
  	 */
  	for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++)
! 	{
! 		/*
! 		 * Another backend doing VACUUM/ANALYZE/CLUSTER may deadlock with us
! 		 * because we might wait for the other backend to finish while it waits
! 		 * for the lock on the relation. We handle this by waiting for a
! 		 * transaction only if it has already set its serializable
! 		 * snapshot. A backend doing VACUUM/ANALYZE/CLUSTER always grabs lock
! 		 * on the relation first and then set its serializable snapshot.
! 		 *
! 		 * A transaction which hasn't yet set its serializable
! 		 * snapshot can not see any of the tuples that we did not index, so
! 		 * its OK if we don't wait for it.
! 		 */
! 		if (TransactionHasSnapshot(snapshot->xip[ixcnt]))
! 			XactLockTableWait(snapshot->xip[ixcnt]);
! 	}
  
  	/* Index can now be marked valid -- update its pg_index entry */
  	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c	14 Mar 2007 18:48:55 -0000	1.349
--- src/backend/commands/vacuum.c	11 Apr 2007 06:10:12 -0000
***************
*** 338,344 ****
  	 *
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
! 	 * only processing one relation.  For multiple relations when not within a
  	 * transaction block, use own transactions so we can release locks sooner.
  	 */
  	if (vacstmt->vacuum)
--- 338,347 ----
  	 *
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
! 	 * only processing one relation.  But we still do that because that helps
! 	 * us to set serializable snapshot only after acquiring appropriate lock
! 	 * on the relation. This avoids certain deadlock conditions with CREATE
! 	 * INDEX CONCURRENTLY. For multiple relations when not within a
  	 * transaction block, use own transactions so we can release locks sooner.
  	 */
  	if (vacstmt->vacuum)
***************
*** 348,357 ****
  		Assert(vacstmt->analyze);
  		if (in_outer_xact)
  			use_own_xacts = false;
! 		else if (list_length(relations) > 1)
  			use_own_xacts = true;
- 		else
- 			use_own_xacts = false;
  	}
  
  	/*
--- 351,358 ----
  		Assert(vacstmt->analyze);
  		if (in_outer_xact)
  			use_own_xacts = false;
! 		else 
  			use_own_xacts = true;
  	}
  
  	/*
***************
*** 411,418 ****
  				if (use_own_xacts)
  				{
  					StartTransactionCommand();
! 					/* functions in indexes may want a snapshot set */
! 					ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  				}
  				else
  					old_context = MemoryContextSwitchTo(anl_context);
--- 412,422 ----
  				if (use_own_xacts)
  				{
  					StartTransactionCommand();
! 					/*
! 					 * functions in indexes may want a snapshot set, but
! 					 * we shall set the snapshot only after acquiring lock on
! 					 * the relation
! 					 */
  				}
  				else
  					old_context = MemoryContextSwitchTo(anl_context);
***************
*** 961,972 ****
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (vacstmt->full)
! 	{
! 		/* functions in indexes may want a snapshot set */
! 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
! 	}
! 	else
  	{
  		/*
  		 * During a lazy VACUUM we do not run any user-supplied functions, and
--- 965,971 ----
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (!vacstmt->full)
  	{
  		/*
  		 * During a lazy VACUUM we do not run any user-supplied functions, and
***************
*** 1017,1022 ****
--- 1016,1027 ----
  		return;
  	}
  
+ 	if (ActiveSnapshot == NULL)
+ 	{
+ 		/* functions in indexes may want a snapshot set */
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 	}
+ 
  	/*
  	 * Check permissions.
  	 *
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.24
diff -c -r1.24 procarray.c
*** src/backend/storage/ipc/procarray.c	3 Apr 2007 16:34:36 -0000	1.24
--- src/backend/storage/ipc/procarray.c	11 Apr 2007 06:10:12 -0000
***************
*** 1079,1084 ****
--- 1079,1121 ----
  	LWLockRelease(ProcArrayLock);
  }
  
+ /*
+  * Returns true if snapshot is set for the given transaction.
+  *
+  * Note: We acquire LW_SHARED on procArray below, which means that
+  * a transaction might be setting its snapshot while we are
+  * checking for it.
+  */
+ bool
+ TransactionHasSnapshot(TransactionId xid)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 	int			index;
+ 	bool		has_snapshot = false;
+ 	
+ 	LWLockAcquire(ProcArrayLock, LW_SHARED);
+ 
+ 	for (index = 0; index < arrayP->numProcs; index++)
+ 	{
+ 		PGPROC	   *proc = arrayP->procs[index];
+ 
+ 		if (proc->pid == 0)
+ 			continue;			/* do not count prepared xacts */
+ 		
+ 		if (proc->xid != xid)
+ 			continue;
+ 
+ 		if (TransactionIdIsValid(proc->xmin))
+ 			has_snapshot = true;
+ 
+ 		break;
+ 	}
+ 
+ 	LWLockRelease(ProcArrayLock);
+ 
+ 	return has_snapshot;
+ }
+ 
  #ifdef XIDCACHE_DEBUG
  
  /*
Index: src/include/storage/procarray.h
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/include/storage/procarray.h,v
retrieving revision 1.13
diff -c -r1.13 procarray.h
*** src/include/storage/procarray.h	3 Apr 2007 16:34:36 -0000	1.13
--- src/include/storage/procarray.h	11 Apr 2007 06:10:12 -0000
***************
*** 40,44 ****
--- 40,45 ----
  
  extern void XidCacheRemoveRunningXids(TransactionId xid,
  						  int nxids, TransactionId *xids);
+ extern bool	TransactionHasSnapshot(TransactionId xid);
  
  #endif   /* PROCARRAY_H */
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#10)
Re: [HACKERS] CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

When I looked at the code, it occurred to me that possibly we are
OK with just taking shared lock on the procarray. That means that
some other transaction can concurrently set its serializable snapshot
while we are scanning the procarray. But that should not harm us:
if we see the snapshot set, we wait for the transaction. A transaction
which is setting its serializable snapshot NOW, can not see the
tuples that we did not index, isn't it ?

[ itch... ] The problem is with time-extended execution of
GetSnapshotData; what happens if the other guy lost the CPU for a good
long time while in the middle of GetSnapshotData? He might set his
xmin based on info you saw as long gone.

You might be correct that it's safe, but the argument would have to
hinge on the OldestXmin process being unable to commit because of
someone holding shared ProcArrayLock; a point you are definitely not
making above. (Study the comments in GetSnapshotData for awhile,
also those in xact.c's commit-related code.)

I'm about to head to bed and am certainly in no condition to carry the
proof through. Have at it ...

regards, tom lane

#12Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#11)
1 attachment(s)
Re: [HACKERS] CIC and deadlocks

On 4/11/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ itch... ] The problem is with time-extended execution of
GetSnapshotData; what happens if the other guy lost the CPU for a good
long time while in the middle of GetSnapshotData? He might set his
xmin based on info you saw as long gone.

You might be correct that it's safe, but the argument would have to
hinge on the OldestXmin process being unable to commit because of
someone holding shared ProcArrayLock; a point you are definitely not
making above. (Study the comments in GetSnapshotData for awhile,
also those in xact.c's commit-related code.)

My argument was based on what you said above, but I obviously did not
state it well :)

Anyways, I think its better to be safe and we agree that its not such a
bad thing to take exclusive lock on procarray because CIC is not something
that happens very often. Attached is a revised patch which takes exclusive
lock on the procarray, rest remaining the same.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Attachments:

CIC_deadlock_v2.patchapplication/octet-stream; name=CIC_deadlock_v2.patchDownload
? GNUmakefile
? config.log
? config.status
? core.7509
? cscope.files
? cscope.out
? install
? contrib/pgbench/pgbench
? src/Makefile.global
? src/backend/postgres
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/libeuc_jis_2004_and_shift_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_win1250.so.0.0
? src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_ascii.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/libutf8_and_cyrillic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/libutf8_and_euc_cn.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/libutf8_and_euc_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/libutf8_and_euc_jp.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/libutf8_and_euc_kr.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/libutf8_and_euc_tw.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/libutf8_and_gb18030.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/libutf8_and_gbk.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/libutf8_and_iso8859.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/libutf8_and_iso8859_1.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_johab/libutf8_and_johab.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/libutf8_and_shift_jis_2004.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/libutf8_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/libutf8_and_uhc.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_win/libutf8_and_win.so.0.0
? src/bin/initdb/initdb
? src/bin/ipcclean/ipcclean
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/psql
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/libecpg_compat.so.2.3
? src/interfaces/ecpg/ecpglib/libecpg.so.5.3
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.2.3
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.1
? src/pl/plpgsql/src/libplpgsql.so.1.0
? src/port/pg_config_paths.h
? src/test/regress/libregress.so.0.0
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/zic
Index: src/backend/commands/analyze.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.104
diff -c -r1.104 analyze.c
*** src/backend/commands/analyze.c	6 Apr 2007 04:21:42 -0000	1.104
--- src/backend/commands/analyze.c	11 Apr 2007 07:42:38 -0000
***************
*** 139,144 ****
--- 139,148 ----
  	if (!onerel)
  		return;
  
+ 	/* set the snapshot if not already done so */
+ 	if (ActiveSnapshot == NULL)
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 	
  	/*
  	 * Check permissions --- this should match vacuum's check!
  	 */
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.159
diff -c -r1.159 cluster.c
*** src/backend/commands/cluster.c	8 Apr 2007 01:26:28 -0000	1.159
--- src/backend/commands/cluster.c	11 Apr 2007 07:42:38 -0000
***************
*** 205,212 ****
  
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
! 			/* functions in indexes may want a snapshot set */
! 			ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
--- 205,215 ----
  
  			/* Start a new transaction for each relation. */
  			StartTransactionCommand();
! 			/* 
! 			 * functions in indexes may want a snapshot set, but we don't
! 			 * set it until the relation is locked succesfully. This helps
! 			 * us avoid certain deadlocks with CREATE INDEX CONCURRENTLY
! 			 */
  			cluster_rel(rvtc, true);
  			CommitTransactionCommand();
  		}
***************
*** 253,258 ****
--- 256,264 ----
  	if (!OldHeap)
  		return;
  
+ 	if (ActiveSnapshot == NULL)
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 
  	/*
  	 * Since we may open a new transaction for each relation, we have to check
  	 * that the relation still is what we think it is.
Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c	13 Mar 2007 00:33:39 -0000	1.157
--- src/backend/commands/indexcmds.c	11 Apr 2007 07:42:38 -0000
***************
*** 512,518 ****
  	 * need not check for that.
  	 */
  	for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++)
! 		XactLockTableWait(snapshot->xip[ixcnt]);
  
  	/* Index can now be marked valid -- update its pg_index entry */
  	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
--- 512,533 ----
  	 * need not check for that.
  	 */
  	for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++)
! 	{
! 		/*
! 		 * Another backend doing VACUUM/ANALYZE/CLUSTER may deadlock with us
! 		 * because we might wait for the other backend to finish while it waits
! 		 * for the lock on the relation. We handle this by waiting for a
! 		 * transaction only if it has already set its serializable
! 		 * snapshot. A backend doing VACUUM/ANALYZE/CLUSTER always grabs lock
! 		 * on the relation first and then set its serializable snapshot.
! 		 *
! 		 * A transaction which hasn't yet set its serializable
! 		 * snapshot can not see any of the tuples that we did not index, so
! 		 * its OK if we don't wait for it.
! 		 */
! 		if (TransactionHasSnapshot(snapshot->xip[ixcnt]))
! 			XactLockTableWait(snapshot->xip[ixcnt]);
! 	}
  
  	/* Index can now be marked valid -- update its pg_index entry */
  	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c	14 Mar 2007 18:48:55 -0000	1.349
--- src/backend/commands/vacuum.c	11 Apr 2007 07:42:38 -0000
***************
*** 338,344 ****
  	 *
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
! 	 * only processing one relation.  For multiple relations when not within a
  	 * transaction block, use own transactions so we can release locks sooner.
  	 */
  	if (vacstmt->vacuum)
--- 338,347 ----
  	 *
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
! 	 * only processing one relation.  But we still do that because that helps
! 	 * us to set serializable snapshot only after acquiring appropriate lock
! 	 * on the relation. This avoids certain deadlock conditions with CREATE
! 	 * INDEX CONCURRENTLY. For multiple relations when not within a
  	 * transaction block, use own transactions so we can release locks sooner.
  	 */
  	if (vacstmt->vacuum)
***************
*** 348,357 ****
  		Assert(vacstmt->analyze);
  		if (in_outer_xact)
  			use_own_xacts = false;
! 		else if (list_length(relations) > 1)
  			use_own_xacts = true;
- 		else
- 			use_own_xacts = false;
  	}
  
  	/*
--- 351,358 ----
  		Assert(vacstmt->analyze);
  		if (in_outer_xact)
  			use_own_xacts = false;
! 		else 
  			use_own_xacts = true;
  	}
  
  	/*
***************
*** 411,418 ****
  				if (use_own_xacts)
  				{
  					StartTransactionCommand();
! 					/* functions in indexes may want a snapshot set */
! 					ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  				}
  				else
  					old_context = MemoryContextSwitchTo(anl_context);
--- 412,422 ----
  				if (use_own_xacts)
  				{
  					StartTransactionCommand();
! 					/*
! 					 * functions in indexes may want a snapshot set, but
! 					 * we shall set the snapshot only after acquiring lock on
! 					 * the relation
! 					 */
  				}
  				else
  					old_context = MemoryContextSwitchTo(anl_context);
***************
*** 961,972 ****
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (vacstmt->full)
! 	{
! 		/* functions in indexes may want a snapshot set */
! 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
! 	}
! 	else
  	{
  		/*
  		 * During a lazy VACUUM we do not run any user-supplied functions, and
--- 965,971 ----
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (!vacstmt->full)
  	{
  		/*
  		 * During a lazy VACUUM we do not run any user-supplied functions, and
***************
*** 1017,1022 ****
--- 1016,1027 ----
  		return;
  	}
  
+ 	if (ActiveSnapshot == NULL)
+ 	{
+ 		/* functions in indexes may want a snapshot set */
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 	}
+ 
  	/*
  	 * Check permissions.
  	 *
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.24
diff -c -r1.24 procarray.c
*** src/backend/storage/ipc/procarray.c	3 Apr 2007 16:34:36 -0000	1.24
--- src/backend/storage/ipc/procarray.c	11 Apr 2007 07:42:38 -0000
***************
*** 1079,1084 ****
--- 1079,1117 ----
  	LWLockRelease(ProcArrayLock);
  }
  
+ /*
+  * Returns true if snapshot is set for the given transaction.
+  */
+ bool
+ TransactionHasSnapshot(TransactionId xid)
+ {
+ 	ProcArrayStruct *arrayP = procArray;
+ 	int			index;
+ 	bool		has_snapshot = false;
+ 	
+ 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ 
+ 	for (index = 0; index < arrayP->numProcs; index++)
+ 	{
+ 		PGPROC	   *proc = arrayP->procs[index];
+ 
+ 		if (proc->pid == 0)
+ 			continue;			/* do not count prepared xacts */
+ 		
+ 		if (proc->xid != xid)
+ 			continue;
+ 
+ 		if (TransactionIdIsValid(proc->xmin))
+ 			has_snapshot = true;
+ 
+ 		break;
+ 	}
+ 
+ 	LWLockRelease(ProcArrayLock);
+ 
+ 	return has_snapshot;
+ }
+ 
  #ifdef XIDCACHE_DEBUG
  
  /*
Index: src/include/storage/procarray.h
===================================================================
RCS file: /home/cvs/postgres/cvs/pgsql/src/include/storage/procarray.h,v
retrieving revision 1.13
diff -c -r1.13 procarray.h
*** src/include/storage/procarray.h	3 Apr 2007 16:34:36 -0000	1.13
--- src/include/storage/procarray.h	11 Apr 2007 07:42:39 -0000
***************
*** 40,44 ****
--- 40,45 ----
  
  extern void XidCacheRemoveRunningXids(TransactionId xid,
  						  int nxids, TransactionId *xids);
+ extern bool	TransactionHasSnapshot(TransactionId xid);
  
  #endif   /* PROCARRAY_H */
#13Bruce Momjian
bruce@momjian.us
In reply to: Pavan Deolasee (#12)
Re: [HACKERS] CIC and deadlocks

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Pavan Deolasee wrote:

On 4/11/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ itch... ] The problem is with time-extended execution of
GetSnapshotData; what happens if the other guy lost the CPU for a good
long time while in the middle of GetSnapshotData? He might set his
xmin based on info you saw as long gone.

You might be correct that it's safe, but the argument would have to
hinge on the OldestXmin process being unable to commit because of
someone holding shared ProcArrayLock; a point you are definitely not
making above. (Study the comments in GetSnapshotData for awhile,
also those in xact.c's commit-related code.)

My argument was based on what you said above, but I obviously did not
state it well :)

Anyways, I think its better to be safe and we agree that its not such a
bad thing to take exclusive lock on procarray because CIC is not something
that happens very often. Attached is a revised patch which takes exclusive
lock on the procarray, rest remaining the same.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#12)
Re: [PATCHES] CIC and deadlocks

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

[ patch to reduce probability of deadlock of CREATE INDEX CONCURRENTLY
with other things ]

This patch no longer applies because of the VirtualXid changes.
Looking at it again, I'm fairly dissatisfied with it anyway;
I really don't like moving the GetTransactionSnapshot calls around
like that, because it opens a risk that GetTransactionSnapshot won't
get called at all.

Since the autovacuum case is already dealt with separately, I'm
thinking there is no problem here that we actually need to solve.
C.I.C. can never be guaranteed free of deadlock risk, so I don't
see a lot of value in making it free of deadlock risk against
just CLUSTER and VACUUM FULL.

regards, tom lane