remove spurious CREATE INDEX CONCURRENTLY wait

Started by Alvaro Herreraover 5 years ago64 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

I previously[1]/messages/by-id/20200805021109.GA9079@alvherre.pgsql posted a patch to have multiple CREATE INDEX CONCURRENTLY
not wait for the slowest of them. This is an update of that, with minor
conflicts fixed and a fresh thread.

To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.

[1]: /messages/by-id/20200805021109.GA9079@alvherre.pgsql

--
�lvaro Herrera http://www.linkedin.com/in/alvherre
"Escucha y olvidar�s; ve y recordar�s; haz y entender�s" (Confucio)

Attachments:

0001-Flag-CREATE-INDEX-CONCURRENTLY-to-avoid-spurious-wai.patchtext/x-diff; charset=us-asciiDownload
From 2596c3033aacceb021463f58b50e2c4eed8a5ab2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH] Flag CREATE INDEX CONCURRENTLY to avoid spurious waiting

---
 src/backend/commands/indexcmds.c | 13 +++++++++++--
 src/include/storage/proc.h       |  4 +++-
 src/include/storage/procarray.h  |  4 ++++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a3cb3cd47f..241c8a337e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -393,7 +393,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -413,7 +413,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -1420,6 +1420,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1482,6 +1485,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1541,6 +1547,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyPgXact->xmin == InvalidTransactionId);
 
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 5ceb2494ba..b030dcde6c 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -52,6 +52,8 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_CIC			0x04	/* currently running CREATE INDEX
+										   CONCURRENTLY */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
@@ -59,7 +61,7 @@ struct XidCache
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_CIC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 01040d76e1..c6edeb36e0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -29,17 +29,21 @@
  */
 #define		PROCARRAY_VACUUM_FLAG			0x02	/* currently running lazy
 													 * vacuum */
+#define		PROCARRAY_CIC_FLAG				0x04	/* currently running CREATE INDEX
+													 * CONCURRENTLY */
 #define		PROCARRAY_LOGICAL_DECODING_FLAG 0x10	/* currently doing logical
 													 * decoding outside xact */
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
  * have no corresponding PROC flag equivalent.
  */
 #define		PROCARRAY_PROC_FLAGS_MASK	(PROCARRAY_VACUUM_FLAG | \
+										 PROCARRAY_CIC_FLAG | \
 										 PROCARRAY_LOGICAL_DECODING_FLAG)
 
 /* Use the following flags as an input "flags" to GetOldestXmin function */
-- 
2.20.1

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

+ James Coleman

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.

Hm. +1 for improving this, if we can, but ...

It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other. It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)

Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes. That certainly needs to be
updated by any patch that's going to change the rules.

BTW, what about REINDEX CONCURRENTLY?

regards, tom lane

#4James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#3)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Aug 10, 2020 at 8:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots. We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes. With this, CIC on small
tables don't have to wait for CIC on large tables to complete.

Hm. +1 for improving this, if we can, but ...

It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other. It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)

Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).

Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes. That certainly needs to be
updated by any patch that's going to change the rules.

Agreed that the comment needs to be updated to discuss the
(im)possibility of arbitrary operations within a snapshot held by CIC.

James

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: James Coleman (#4)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

James Coleman <jtc331@gmail.com> writes:

Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).

Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument. Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore. Why would CIC be different?

regards, tom lane

#6James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#5)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Tue, Aug 11, 2020 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:

Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).

Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument. Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore. Why would CIC be different?

The comments for WaitForOlderSnapshots() don't discuss that problem at
all; as far as ANALYZE goes they only say:

* Manual ANALYZEs, however, can't be excluded because they
* might be within transactions that are going to do arbitrary operations
* later.

But nonetheless over in the thread Álvaro linked to we'd discussed
these issues already. Andres in [1] and [2] believed that:

For the snapshot waits we can add a procarray flag
(alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is
doing. Which WaitForOlderSnapshots() can then use to ignore those CICs,
which is safe, because those transactions definitely don't insert into
relations targeted by CIC. The change to WaitForOlderSnapshots() would
just be to pass the new flag to GetCurrentVirtualXIDs, I think.

and

What I was thinking of was a new flag, with a distinct value from
PROC_IN_VACUUM. It'd currently just be specified in the
GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid
needing to wait for other CICs on different relations. Since CIC is not
permitted on system tables, and CIC doesn't do DML on normal tables, it
seems fairly obviously correct to exclude other CICs.

In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:

Well, even if we consider this an actual problem, we could still use
PROC_IN_CIC for non-expression non-partial indexes (index operator
themselves better ensure this isn't a problem, or they're ridiculously
broken already - they can get called during vacuum).

But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.

So from what I understand, everything that I'd claimed in my previous
message still holds true for non-expression/non-partial indexes. Is
there something else I'm missing?

James

1: /messages/by-id/20200325191935.jjhdg2zy5k7ath5v@alap3.anarazel.de
2: /messages/by-id/20200325195841.gq4hpl25t6pxv3gl@alap3.anarazel.de
3: /messages/by-id/CAAaqYe_fveT_tvKonVt1imujOURUUMrydGeaBGahqLLy9D-REw@mail.gmail.com
4: /messages/by-id/20200416221207.wmnzbxvjqqpazeob@alap3.anarazel.de

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: James Coleman (#6)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Aug-11, James Coleman wrote:

In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:

Well, even if we consider this an actual problem, we could still use
PROC_IN_CIC for non-expression non-partial indexes (index operator
themselves better ensure this isn't a problem, or they're ridiculously
broken already - they can get called during vacuum).

But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.

So the easy first patch here is to add the flag as PROC_IN_SAFE_CIC,
which is set only for CIC when it's used for indexes that are neither
on expressions nor partial. Then ignore those in WaitForOlderSnapshots.
The attached patch does it that way. (Also updated per recent
conflicts).

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Also, per [1]/messages/by-id/20191101203310.GA12239@alvherre.pgsql, ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not. This patch doesn't do that
yet, but it seems the natural next step to take.

[1]: /messages/by-id/20191101203310.GA12239@alvherre.pgsql

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload
From 2a5088dfa35cbc800a87dc2154b6ebfa22837a66 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v2] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 50 ++++++++++++++++++++++++++++++--
 src/include/storage/proc.h       |  6 +++-
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 254dbcdce5..459f6fa5db 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -372,7 +372,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -393,7 +396,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_CIC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -413,7 +417,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_CIC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -506,6 +511,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1033,6 +1039,17 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/*
+	 * When doing concurrent index builds, we can set a PGPROC flag to tell
+	 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+	 * to ignore us when waiting for concurrent snapshots.  That can only be
+	 * done for indexes that don't execute any expressions.  Determine that.
+	 * (The flag is reset automatically at transaction end, so it must be
+	 * set for each transaction.)
+	 */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1419,6 +1436,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_CIC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1478,6 +1504,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_CIC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1534,6 +1569,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_CIC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9c9a50ae45..d91e199a60 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,17 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_CIC	0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_CIC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.20.1

#8Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#7)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not. This patch doesn't do that
yet, but it seems the natural next step to take.

[1] /messages/by-id/20191101203310.GA12239@alvherre.pgsql

Could we consider renaming vacuumFlags? With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing. Something like statusFlags could fit better
in the picture?
--
Michael

#9Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#8)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

Hi,

After looking through the thread and reading the patch it seems good,
and there are only few minor questions:

* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.

* Naming, to be more precise what suggested Michael:

Could we consider renaming vacuumFlags? With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing. Something like statusFlags could fit better
in the picture?

which sounds reasonable, and similar one about flag name
PROC_IN_SAFE_CIC - if it covers both CREATE INDEX/REINDEX CONCURRENTLY
maybe just PROC_IN_SAFE_IC?

Any plans about those questions? I can imagine that are the only missing
parts.

#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#9)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:

On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

Hi,

After looking through the thread and reading the patch it seems good,
and there are only few minor questions:

* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.

Just to give it a shot, would the attached change be enough?

Attachments:

v3-0001-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload
From d30d8acf91679985970334069ab7f1f8f7fc3ec5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v3] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 122 ++++++++++++++++++++++++++++++-
 src/include/storage/proc.h       |   6 +-
 2 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75552c64ed..5019397d50 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -406,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_IC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -426,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_IC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -519,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1045,6 +1051,17 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/*
+	 * When doing concurrent index builds, we can set a PGPROC flag to tell
+	 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+	 * to ignore us when waiting for concurrent snapshots.  That can only be
+	 * done for indexes that don't execute any expressions.  Determine that.
+	 * (The flag is reset automatically at transaction end, so it must be
+	 * set for each transaction.)
+	 */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1431,6 +1448,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1490,6 +1516,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1546,6 +1581,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
@@ -3021,6 +3065,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
 	};
 	int64		progress_vals[4];
+	bool		safe_index = true;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3324,6 +3369,23 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/*
+		 * When doing concurrent reindex, we can set a PGPROC flag to tell
+		 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX
+		 * CONCURRENTLY to ignore us when waiting for concurrent snapshots.
+		 * That can only be done for indexes that don't execute any
+		 * expressions. Determine that for all involved indexes together. (The
+		 * flag is reset automatically at transaction end, so it must be set
+		 * for each transaction.)
+		 */
+		if (safe_index)
+		{
+			IndexInfo  *newIndexInfo = BuildIndexInfo(newIndexRel);
+
+			safe_index = newIndexInfo->ii_Expressions == NIL &&
+				newIndexInfo->ii_Predicate == NIL;
+		}
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
@@ -3393,6 +3455,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3456,6 +3527,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	}
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3559,6 +3639,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	forboth(lc, indexIds, lc2, newIndexIds)
 	{
 		char	   *oldName;
@@ -3609,6 +3698,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3641,6 +3739,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3692,6 +3799,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	/* Start a new transaction to finish process properly */
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/* Log what we did */
 	if (options & REINDEXOPT_VERBOSE)
 	{
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9c9a50ae45..50ce5c8cf2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,17 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.21.0

#11Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#10)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 09, 2020 at 04:47:43PM +0100, Dmitry Dolgov wrote:

On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:

On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved. The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though. Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

Hi,

After looking through the thread and reading the patch it seems good,
and there are only few minor questions:

* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
fact it's already mentioned in the commentaries as done, which a bit
confusing.

Just to give it a shot, would the attached change be enough?

Could it be possible to rename vacuumFlags to statusFlags first? I
did not see any objection to do this suggestion.

+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.
--
Michael

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#11)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Michael Paquier <michael@paquier.xyz> writes:

+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

regards, tom lane

#13Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#12)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.
--
Michael

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#13)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.

Not sure I believe that it doesn't matter much in practice. If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.

regards, tom lane

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#14)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-09, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock. That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.

Not sure I believe that it doesn't matter much in practice. If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.

Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance. However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore? That would need some benchmarking of GetSnapshotData.

#16Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#15)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 09, 2020 at 11:31:15PM -0300, Alvaro Herrera wrote:

Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance. However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore? That would need some benchmarking of GetSnapshotData.

Hmm. If you worry about the performance impact here, it is possible
to do a small performance test without this patch. vacuum_rel() sets
the flag for a non-full VACUUM, so with one backend running a manual
VACUUM in loop on an empty relation could apply some pressure on any
benchmark, even a simple pgbench.
--
Michael

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#15)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Yeah ... it would be much better if we can make it use atomics instead.

I was thinking more like "do we need any locking at all".

Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about. On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes. And even if they don't see it, at worst
we lose the optimization being proposed.

There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"

regards, tom lane

#18Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Tom Lane (#17)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Yeah ... it would be much better if we can make it use atomics instead.

I was thinking more like "do we need any locking at all".

Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about. On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes. And even if they don't see it, at worst
we lose the optimization being proposed.

There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"

Sounds right, but after reading the thread about GetSnapshotData
scalability more thoroughly there seem to be an assumption that those
copies have to be updated at the same time under the same lock, and
claims that in some cases justification for correctness around not
taking ProcArrayLock is too complicated, at least for now.

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.

#19Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#18)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.

Thanks for planning some benchmarking for this specific patch. I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
--
Michael

#20Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#19)
2 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Fri, Nov 13, 2020 at 09:25:40AM +0900, Michael Paquier wrote:
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.

Thanks for planning some benchmarking for this specific patch. I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.

I've tried first to test scenarios where GetSnapshotData produces
significant lock contention and "reindex concurrently" implementation
with locks interferes with it. The idea I had is to create a test
function that constantly calls GetSnapshotData (perf indeed shows
significant portion of time spent on contended lock), and clash it with
a stream of "reindex concurrently" of an empty relation (which still
reaches safe_index check). I guess it could be considered as an
artificial extreme case. Measuring GetSnapshotData (or rather the
surrounding wrapper, to distinguish calls from the test function from
everything else) latency without reindex, with reindex and locks, with
reindex without locks should produce different "modes" and comparing
them we can make some conclusions.

Latency histograms without reindex (nanoseconds):

nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 10001209 |****************************************|
2048 -> 4095 : 76936 | |
4096 -> 8191 : 1468 | |
8192 -> 16383 : 98 | |
16384 -> 32767 : 39 | |
32768 -> 65535 : 6 | |

The same with reindex without locks:

nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 111345 | |
2048 -> 4095 : 6997627 |****************************************|
4096 -> 8191 : 18575 | |
8192 -> 16383 : 586 | |
16384 -> 32767 : 312 | |
32768 -> 65535 : 18 | |

The same with reindex with locks:

nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 59438 | |
2048 -> 4095 : 6901187 |****************************************|
4096 -> 8191 : 18584 | |
8192 -> 16383 : 581 | |
16384 -> 32767 : 280 | |
32768 -> 65535 : 84 | |

Looks like with reindex without locks is indeed faster (there are mode
samples in lower time section), but not particularly significant to the
whole distribution, especially taking into account extremity of the
test.

I'll take a look at benchmarking of switching vacuumFlags to use
atomics, but as it's probably a bit off topic I'm going to attach
another version of the patch with locks and suggested changes. To which
I have one question:

Michael Paquier <michael@paquier.xyz> writes:

I think that this should be in its own routine, and that we had better
document that this should be called just after starting a transaction,
with an assertion enforcing that.

I'm not sure which exactly assertion condition do you mean?

Attachments:

v4-0001-Rename-vaccumFlags-to-statusFlags.patchtext/x-diff; charset=us-asciiDownload
From 07c4705a22e6cdd5717df46a974ce00a69fc901f Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Wed, 11 Nov 2020 15:19:48 +0100
Subject: [PATCH v4 1/2] Rename vaccumFlags to statusFlags

With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.
---
 src/backend/access/transam/twophase.c     |  2 +-
 src/backend/commands/vacuum.c             |  6 +--
 src/backend/postmaster/autovacuum.c       |  6 +--
 src/backend/replication/logical/logical.c |  4 +-
 src/backend/replication/slot.c            |  4 +-
 src/backend/storage/ipc/procarray.c       | 54 +++++++++++------------
 src/backend/storage/lmgr/deadlock.c       |  4 +-
 src/backend/storage/lmgr/proc.c           | 20 ++++-----
 src/include/storage/proc.h                | 12 ++---
 9 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060443..873bf9bad9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -464,7 +464,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->xid = xid;
 	Assert(proc->xmin == InvalidTransactionId);
 	proc->delayChkpt = false;
-	proc->vacuumFlags = 0;
+	proc->statusFlags = 0;
 	proc->pid = 0;
 	proc->backendId = InvalidBackendId;
 	proc->databaseId = databaseid;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 1b6717f727..d89ba3031f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1742,10 +1742,10 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		 * might appear to go backwards, which is probably Not Good.
 		 */
 		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-		MyProc->vacuumFlags |= PROC_IN_VACUUM;
+		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
-			MyProc->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
-		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
+		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
 	}
 
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2cef56f115..aa5b97fbac 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2509,7 +2509,7 @@ do_autovacuum(void)
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			EmitErrorReport();
 
-			/* this resets ProcGlobal->vacuumFlags[i] too */
+			/* this resets ProcGlobal->statusFlags[i] too */
 			AbortOutOfAnyTransaction();
 			FlushErrorState();
 			MemoryContextResetAndDeleteChildren(PortalContext);
@@ -2525,7 +2525,7 @@ do_autovacuum(void)
 
 		did_vacuum = true;
 
-		/* ProcGlobal->vacuumFlags[i] are reset at the next end of xact */
+		/* ProcGlobal->statusFlags[i] are reset at the next end of xact */
 
 		/* be tidy */
 deleted:
@@ -2702,7 +2702,7 @@ perform_work_item(AutoVacuumWorkItem *workitem)
 				   cur_datname, cur_nspname, cur_relname);
 		EmitErrorReport();
 
-		/* this resets ProcGlobal->vacuumFlags[i] too */
+		/* this resets ProcGlobal->statusFlags[i] too */
 		AbortOutOfAnyTransaction();
 		FlushErrorState();
 		MemoryContextResetAndDeleteChildren(PortalContext);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d5cfbeaa4a..f1f4df7d70 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -182,8 +182,8 @@ StartupDecodingContext(List *output_plugin_options,
 	if (!IsTransactionOrTransactionBlock())
 	{
 		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-		MyProc->vacuumFlags |= PROC_IN_LOGICAL_DECODING;
-		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
+		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
 	}
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 09be1d8c48..5ed955ba0b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -528,8 +528,8 @@ ReplicationSlotRelease(void)
 
 	/* might not have been set when we've been a plain slot */
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	MyProc->vacuumFlags &= ~PROC_IN_LOGICAL_DECODING;
-	ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
+	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
 }
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 05661e379e..3a28da0e80 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -488,13 +488,13 @@ ProcArrayAdd(PGPROC *proc)
 			(arrayP->numProcs - index) * sizeof(*ProcGlobal->xids));
 	memmove(&ProcGlobal->subxidStates[index + 1], &ProcGlobal->subxidStates[index],
 			(arrayP->numProcs - index) * sizeof(*ProcGlobal->subxidStates));
-	memmove(&ProcGlobal->vacuumFlags[index + 1], &ProcGlobal->vacuumFlags[index],
-			(arrayP->numProcs - index) * sizeof(*ProcGlobal->vacuumFlags));
+	memmove(&ProcGlobal->statusFlags[index + 1], &ProcGlobal->statusFlags[index],
+			(arrayP->numProcs - index) * sizeof(*ProcGlobal->statusFlags));
 
 	arrayP->pgprocnos[index] = proc->pgprocno;
 	ProcGlobal->xids[index] = proc->xid;
 	ProcGlobal->subxidStates[index] = proc->subxidStatus;
-	ProcGlobal->vacuumFlags[index] = proc->vacuumFlags;
+	ProcGlobal->statusFlags[index] = proc->statusFlags;
 
 	arrayP->numProcs++;
 
@@ -562,7 +562,7 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 	Assert(TransactionIdIsValid(ProcGlobal->xids[proc->pgxactoff] == 0));
 	Assert(TransactionIdIsValid(ProcGlobal->subxidStates[proc->pgxactoff].count == 0));
 	Assert(TransactionIdIsValid(ProcGlobal->subxidStates[proc->pgxactoff].overflowed == false));
-	ProcGlobal->vacuumFlags[proc->pgxactoff] = 0;
+	ProcGlobal->statusFlags[proc->pgxactoff] = 0;
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -575,8 +575,8 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 					(arrayP->numProcs - index - 1) * sizeof(*ProcGlobal->xids));
 			memmove(&ProcGlobal->subxidStates[index], &ProcGlobal->subxidStates[index + 1],
 					(arrayP->numProcs - index - 1) * sizeof(*ProcGlobal->subxidStates));
-			memmove(&ProcGlobal->vacuumFlags[index], &ProcGlobal->vacuumFlags[index + 1],
-					(arrayP->numProcs - index - 1) * sizeof(*ProcGlobal->vacuumFlags));
+			memmove(&ProcGlobal->statusFlags[index], &ProcGlobal->statusFlags[index + 1],
+					(arrayP->numProcs - index - 1) * sizeof(*ProcGlobal->statusFlags));
 
 			arrayP->pgprocnos[arrayP->numProcs - 1] = -1;	/* for debugging */
 			arrayP->numProcs--;
@@ -660,13 +660,13 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 
 		/* must be cleared with xid/xmin: */
 		/* avoid unnecessarily dirtying shared cachelines */
-		if (proc->vacuumFlags & PROC_VACUUM_STATE_MASK)
+		if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 		{
 			Assert(!LWLockHeldByMe(ProcArrayLock));
 			LWLockAcquire(ProcArrayLock, LW_SHARED);
-			Assert(proc->vacuumFlags == ProcGlobal->vacuumFlags[proc->pgxactoff]);
-			proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-			ProcGlobal->vacuumFlags[proc->pgxactoff] = proc->vacuumFlags;
+			Assert(proc->statusFlags == ProcGlobal->statusFlags[proc->pgxactoff]);
+			proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
+			ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
 			LWLockRelease(ProcArrayLock);
 		}
 	}
@@ -695,10 +695,10 @@ ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
 
 	/* must be cleared with xid/xmin: */
 	/* avoid unnecessarily dirtying shared cachelines */
-	if (proc->vacuumFlags & PROC_VACUUM_STATE_MASK)
+	if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 	{
-		proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		ProcGlobal->vacuumFlags[proc->pgxactoff] = proc->vacuumFlags;
+		proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
+		ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
 	}
 
 	/* Clear the subtransaction-XID cache too while holding the lock */
@@ -875,7 +875,7 @@ ProcArrayClearTransaction(PGPROC *proc)
 	proc->xmin = InvalidTransactionId;
 	proc->recoveryConflictPending = false;
 
-	Assert(!(proc->vacuumFlags & PROC_VACUUM_STATE_MASK));
+	Assert(!(proc->statusFlags & PROC_VACUUM_STATE_MASK));
 	Assert(!proc->delayChkpt);
 
 	/*
@@ -1710,7 +1710,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		PGPROC	   *proc = &allProcs[pgprocno];
-		int8		vacuumFlags = ProcGlobal->vacuumFlags[index];
+		int8		statusFlags = ProcGlobal->statusFlags[index];
 		TransactionId xid;
 		TransactionId xmin;
 
@@ -1745,7 +1745,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * removed, as long as pg_subtrans is not truncated) or doing logical
 		 * decoding (which manages xmin separately, check below).
 		 */
-		if (vacuumFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
+		if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
 			continue;
 
 		/* shared tables need to take backends in all database into account */
@@ -2194,7 +2194,7 @@ GetSnapshotData(Snapshot snapshot)
 		TransactionId *xip = snapshot->xip;
 		int		   *pgprocnos = arrayP->pgprocnos;
 		XidCacheStatus *subxidStates = ProcGlobal->subxidStates;
-		uint8	   *allVacuumFlags = ProcGlobal->vacuumFlags;
+		uint8	   *allStatusFlags = ProcGlobal->statusFlags;
 
 		/*
 		 * First collect set of pgxactoff/xids that need to be included in the
@@ -2204,7 +2204,7 @@ GetSnapshotData(Snapshot snapshot)
 		{
 			/* Fetch xid just once - see GetNewTransactionId */
 			TransactionId xid = UINT32_ACCESS_ONCE(other_xids[pgxactoff]);
-			uint8		vacuumFlags;
+			uint8		statusFlags;
 
 			Assert(allProcs[arrayP->pgprocnos[pgxactoff]].pgxactoff == pgxactoff);
 
@@ -2243,8 +2243,8 @@ GetSnapshotData(Snapshot snapshot)
 			 * Skip over backends doing logical decoding which manages xmin
 			 * separately (check below) and ones running LAZY VACUUM.
 			 */
-			vacuumFlags = allVacuumFlags[pgxactoff];
-			if (vacuumFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
+			statusFlags = allStatusFlags[pgxactoff];
+			if (statusFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
 				continue;
 
 			if (NormalTransactionIdPrecedes(xid, xmin))
@@ -2483,11 +2483,11 @@ ProcArrayInstallImportedXmin(TransactionId xmin,
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		PGPROC	   *proc = &allProcs[pgprocno];
-		int			vacuumFlags = ProcGlobal->vacuumFlags[index];
+		int			statusFlags = ProcGlobal->statusFlags[index];
 		TransactionId xid;
 
 		/* Ignore procs running LAZY VACUUM */
-		if (vacuumFlags & PROC_IN_VACUUM)
+		if (statusFlags & PROC_IN_VACUUM)
 			continue;
 
 		/* We are only interested in the specific virtual transaction. */
@@ -3142,7 +3142,7 @@ IsBackendPid(int pid)
  *	If excludeXmin0 is true, skip processes with xmin = 0.
  *	If allDbs is false, skip processes attached to other databases.
  *	If excludeVacuum isn't zero, skip processes for which
- *		(vacuumFlags & excludeVacuum) is not zero.
+ *		(statusFlags & excludeVacuum) is not zero.
  *
  * Note: the purpose of the limitXmin and excludeXmin0 parameters is to
  * allow skipping backends whose oldest live snapshot is no older than
@@ -3176,12 +3176,12 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		PGPROC	   *proc = &allProcs[pgprocno];
-		uint8		vacuumFlags = ProcGlobal->vacuumFlags[index];
+		uint8		statusFlags = ProcGlobal->statusFlags[index];
 
 		if (proc == MyProc)
 			continue;
 
-		if (excludeVacuum & vacuumFlags)
+		if (excludeVacuum & statusFlags)
 			continue;
 
 		if (allDbs || proc->databaseId == MyDatabaseId)
@@ -3596,7 +3596,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 		{
 			int			pgprocno = arrayP->pgprocnos[index];
 			PGPROC	   *proc = &allProcs[pgprocno];
-			uint8		vacuumFlags = ProcGlobal->vacuumFlags[index];
+			uint8		statusFlags = ProcGlobal->statusFlags[index];
 
 			if (proc->databaseId != databaseId)
 				continue;
@@ -3610,7 +3610,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 			else
 			{
 				(*nbackends)++;
-				if ((vacuumFlags & PROC_IS_AUTOVACUUM) &&
+				if ((statusFlags & PROC_IS_AUTOVACUUM) &&
 					nautovacs < MAXAUTOVACPIDS)
 					autovac_pids[nautovacs++] = proc->pid;
 			}
diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index e1246b8a4d..cb7a8f0fd2 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -618,7 +618,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc,
 					 * that an autovacuum won't be canceled with less than
 					 * deadlock_timeout grace period.
 					 *
-					 * Note we read vacuumFlags without any locking.  This is
+					 * Note we read statusFlags without any locking.  This is
 					 * OK only for checking the PROC_IS_AUTOVACUUM flag,
 					 * because that flag is set at process start and never
 					 * reset.  There is logic elsewhere to avoid canceling an
@@ -628,7 +628,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc,
 					 * ProcArrayLock.
 					 */
 					if (checkProc == MyProc &&
-						proc->vacuumFlags & PROC_IS_AUTOVACUUM)
+						proc->statusFlags & PROC_IS_AUTOVACUUM)
 						blocking_autovacuum_proc = proc;
 
 					/* We're done looking at this proclock */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 88566bd9fa..d1738c65f5 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -111,7 +111,7 @@ ProcGlobalShmemSize(void)
 
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->xids)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
-	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->vacuumFlags)));
+	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
 	return size;
 }
@@ -210,8 +210,8 @@ InitProcGlobal(void)
 	MemSet(ProcGlobal->xids, 0, TotalProcs * sizeof(*ProcGlobal->xids));
 	ProcGlobal->subxidStates = (XidCacheStatus *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->subxidStates));
 	MemSet(ProcGlobal->subxidStates, 0, TotalProcs * sizeof(*ProcGlobal->subxidStates));
-	ProcGlobal->vacuumFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->vacuumFlags));
-	MemSet(ProcGlobal->vacuumFlags, 0, TotalProcs * sizeof(*ProcGlobal->vacuumFlags));
+	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
+	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
 	for (i = 0; i < TotalProcs; i++)
 	{
@@ -393,10 +393,10 @@ InitProcess(void)
 	MyProc->tempNamespaceId = InvalidOid;
 	MyProc->isBackgroundWorker = IsBackgroundWorker;
 	MyProc->delayChkpt = false;
-	MyProc->vacuumFlags = 0;
+	MyProc->statusFlags = 0;
 	/* NB -- autovac launcher intentionally does not set IS_AUTOVACUUM */
 	if (IsAutoVacuumWorkerProcess())
-		MyProc->vacuumFlags |= PROC_IS_AUTOVACUUM;
+		MyProc->statusFlags |= PROC_IS_AUTOVACUUM;
 	MyProc->lwWaiting = false;
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
@@ -574,7 +574,7 @@ InitAuxiliaryProcess(void)
 	MyProc->tempNamespaceId = InvalidOid;
 	MyProc->isBackgroundWorker = IsBackgroundWorker;
 	MyProc->delayChkpt = false;
-	MyProc->vacuumFlags = 0;
+	MyProc->statusFlags = 0;
 	MyProc->lwWaiting = false;
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
@@ -1310,7 +1310,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM && allow_autovacuum_cancel)
 		{
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
-			uint8		vacuumFlags;
+			uint8		statusFlags;
 
 			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
@@ -1318,9 +1318,9 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			 * Only do it if the worker is not working to protect against Xid
 			 * wraparound.
 			 */
-			vacuumFlags = ProcGlobal->vacuumFlags[autovac->pgxactoff];
-			if ((vacuumFlags & PROC_IS_AUTOVACUUM) &&
-				!(vacuumFlags & PROC_VACUUM_FOR_WRAPAROUND))
+			statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
+			if ((statusFlags & PROC_IS_AUTOVACUUM) &&
+				!(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
 			{
 				int			pid = autovac->pid;
 				StringInfoData locktagbuf;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9c9a50ae45..63aea0e253 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -49,7 +49,7 @@ struct XidCache
 };
 
 /*
- * Flags for ProcGlobal->vacuumFlags[]
+ * Flags for ProcGlobal->statusFlags[]
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
@@ -175,9 +175,9 @@ struct PGPROC
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
-	uint8		vacuumFlags;    /* this backend's vacuum flags, see PROC_*
+	uint8		statusFlags;    /* this backend's status flags, see PROC_*
 								 * above. mirrored in
-								 * ProcGlobal->vacuumFlags[pgxactoff] */
+								 * ProcGlobal->statusFlags[pgxactoff] */
 	/*
 	 * Info to allow us to wait for synchronous replication, if needed.
 	 * waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend.
@@ -273,7 +273,7 @@ extern PGDLLIMPORT PGPROC *MyProc;
  * allow for as tight loops accessing the data as possible. Second, to prevent
  * updates of frequently changing data (e.g. xmin) from invalidating
  * cachelines also containing less frequently changing data (e.g. xid,
- * vacuumFlags). Third to condense frequently accessed data into as few
+ * statusFlags). Third to condense frequently accessed data into as few
  * cachelines as possible.
  *
  * There are two main reasons to have the data mirrored between these dense
@@ -315,10 +315,10 @@ typedef struct PROC_HDR
 	XidCacheStatus *subxidStates;
 
 	/*
-	 * Array mirroring PGPROC.vacuumFlags for each PGPROC currently in the
+	 * Array mirroring PGPROC.statusFlags for each PGPROC currently in the
 	 * procarray.
 	 */
-	uint8	   *vacuumFlags;
+	uint8	   *statusFlags;
 
 	/* Length of allProcs array */
 	uint32		allProcCount;
-- 
2.21.0

v4-0002-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload
From 0474461b71420d7331d0d59b84ae497ffb20f0d1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v4 2/2] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 92 ++++++++++++++++++++++++++++++--
 src/include/storage/proc.h       |  6 ++-
 2 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75552c64ed..4abb60ea44 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -94,6 +94,7 @@ static void ReindexMultipleInternal(List *relids, int options);
 static void reindex_error_callback(void *args);
 static void update_relispartition(Oid relationId, bool newval);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
+static void set_safe_index_flag(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -385,7 +386,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -406,7 +410,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_IC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -426,7 +431,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_IC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -519,6 +525,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1045,6 +1052,17 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/*
+	 * When doing concurrent index builds, we can set a PGPROC flag to tell
+	 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+	 * to ignore us when waiting for concurrent snapshots.  That can only be
+	 * done for indexes that don't execute any expressions.  Determine that.
+	 * (The flag is reset automatically at transaction end, so it must be
+	 * set for each transaction.)
+	 */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1431,6 +1449,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1490,6 +1512,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1546,6 +1572,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
@@ -3021,6 +3051,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
 	};
 	int64		progress_vals[4];
+	bool		safe_index = true;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3324,6 +3355,23 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/*
+		 * When doing concurrent reindex, we can set a PGPROC flag to tell
+		 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX
+		 * CONCURRENTLY to ignore us when waiting for concurrent snapshots.
+		 * That can only be done for indexes that don't execute any
+		 * expressions. Determine that for all involved indexes together. (The
+		 * flag is reset automatically at transaction end, so it must be set
+		 * for each transaction.)
+		 */
+		if (safe_index)
+		{
+			IndexInfo  *newIndexInfo = BuildIndexInfo(newIndexRel);
+
+			safe_index = newIndexInfo->ii_Expressions == NIL &&
+				newIndexInfo->ii_Predicate == NIL;
+		}
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
@@ -3393,6 +3441,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3456,6 +3508,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	}
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3559,6 +3615,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	forboth(lc, indexIds, lc2, newIndexIds)
 	{
 		char	   *oldName;
@@ -3609,6 +3669,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3641,6 +3705,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3692,6 +3760,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	/* Start a new transaction to finish process properly */
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/* Log what we did */
 	if (options & REINDEXOPT_VERBOSE)
 	{
@@ -3896,3 +3968,17 @@ update_relispartition(Oid relationId, bool newval)
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
+
+/*
+ * Set a PGPROC flag to tell concurrent VACUUM, CREATE INDEX CONCURRENTLY and
+ * REINDEX CONCURRENTLY to ignore us when waiting for concurrent snapshots.
+ * Should be called just after starting a transaction.
+ */
+static void
+set_safe_index_flag()
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	MyProc->statusFlags |= PROC_IN_SAFE_IC;
+	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+	LWLockRelease(ProcArrayLock);
+}
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 63aea0e253..6664803e2a 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,17 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.21.0

#21Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#17)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Tue, 10 Nov 2020 at 03:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Yeah ... it would be much better if we can make it use atomics instead.

I was thinking more like "do we need any locking at all".

Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about. On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes. And even if they don't see it, at worst
we lose the optimization being proposed.

There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"

Agreed to all of the above, but I think the issue is miniscule because
ProcArrayLock is acquired in LW_EXCLUSIVE mode at transaction commit.
So it doesn't seem like there is much need to optimize this particular
aspect of the patch.

--
Simon Riggs http://www.EnterpriseDB.com/

#22Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dmitry Dolgov (#20)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-16, Dmitry Dolgov wrote:

The same with reindex without locks:

nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 111345 | |
2048 -> 4095 : 6997627 |****************************************|
4096 -> 8191 : 18575 | |
8192 -> 16383 : 586 | |
16384 -> 32767 : 312 | |
32768 -> 65535 : 18 | |

The same with reindex with locks:

nsecs : count distribution
512 -> 1023 : 0 | |
1024 -> 2047 : 59438 | |
2048 -> 4095 : 6901187 |****************************************|
4096 -> 8191 : 18584 | |
8192 -> 16383 : 581 | |
16384 -> 32767 : 280 | |
32768 -> 65535 : 84 | |

Looks like with reindex without locks is indeed faster (there are mode
samples in lower time section), but not particularly significant to the
whole distribution, especially taking into account extremity of the
test.

I didn't analyze these numbers super carefully, but yeah it doesn't look
significant.

I'm looking at these patches now, with intention to push.

#23Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#12)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-09, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

BTW I now know that the reason for taking ProcArrayLock is not the
vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

On the other hand, if we stopped mirroring the flags in ProcGlobal, it
would mean we would have to access all procs' PGPROC entries in
GetSnapshotData, which is undesirable for performance reason (per commit
5788e258bb26).

#24Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dmitry Dolgov (#20)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

I am really unsure about the REINDEX CONCURRENTLY part of this, for two
reasons:

1. It is not as good when reindexing multiple indexes, because we can
only apply the flag if *all* indexes are "safe". Any unsafe index means
we step down from it for the whole thing. This is probably not worth
worrying much about, but still.

2. In some of the waiting transactions, we actually do more things than
what we do in CREATE INDEX CONCURRENTLY transactions --- some catalog
updates, but we also do the whole index validation phase. Is that OK?
It's not as clear to me that it is safe to set the flag in all those
places.

I moved the comments to the new function and made it inline. I also
changed the way we determine how the function is safe; there's no reason
to build an IndexInfo if we can simply look at rel->rd_indexprs and
rel->indpred.

I've been wondering if it would be sane/safe to do the WaitForFoo stuff
outside of any transaction.

I'll have a look again tomorrow.

Attachments:

v5-0001-Avoid-spurious-CREATE-INDEX-CONCURRENTLY-waits.patchtext/x-diff; charset=us-asciiDownload
From 1e5560d4a3f3e3b4cb83803f90985701fb4dee8c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v5] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 85 +++++++++++++++++++++++++++++---
 src/include/storage/proc.h       |  6 ++-
 2 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 35696f9f75..44ea84c54d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
 							  Oid *typeOidP,
@@ -87,13 +88,12 @@ static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 											Oid relId, Oid oldRelId, void *arg);
-static bool ReindexRelationConcurrently(Oid relationOid, int options);
-
+static void reindex_error_callback(void *args);
 static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
-static void reindex_error_callback(void *args);
+static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
-static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
+static inline void set_safe_index_flag(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -385,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -406,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_IC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -426,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_IC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -519,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1045,6 +1051,10 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/* Determine whether we can call set_safe_index_flag */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1431,6 +1441,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1490,6 +1504,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1546,6 +1564,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
@@ -3021,6 +3043,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
 	};
 	int64		progress_vals[4];
+	bool		all_indexes_safe = true;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3325,6 +3348,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/* consider safety of this index for set_safe_index_flag */
+		if (all_indexes_safe &&
+			(newIndexRel->rd_indexprs != NIL ||
+			 newIndexRel->rd_indpred != NIL))
+			all_indexes_safe = false;
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
@@ -3394,6 +3423,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3457,6 +3490,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	}
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3610,6 +3647,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3642,6 +3683,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3897,3 +3942,29 @@ update_relispartition(Oid relationId, bool newval)
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
+
+/*
+ * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
+ *
+ * When doing concurrent index builds, we can set this flag
+ * to tell other processes concurrently running VACUUM, CREATE
+ * INDEX CONCURRENTLY and REINDEX CONCURRENTLY to ignore us when
+ * doing their waits for concurrent snapshots.  On one hand it
+ * avoids pointlessly waiting for a process that's not interesting
+ * anyway, but more importantly it avoids deadlocks in some cases.
+ *
+ * This can only be done for indexes that don't execute any expressions.
+ * Caller is responsible for only calling this routine when that
+ * assumption holds true.
+ *
+ * (The flag is reset automatically at transaction end, so it must be
+ * set for each transaction.)
+ */
+static inline void
+set_safe_index_flag(void)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	MyProc->statusFlags |= PROC_IN_SAFE_IC;
+	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+	LWLockRelease(ProcArrayLock);
+}
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1067f58f51..b2347ffd79 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,17 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.20.1

#25Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#24)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 16, 2020 at 09:23:41PM -0300, Alvaro Herrera wrote:

I've been wondering if it would be sane/safe to do the WaitForFoo stuff
outside of any transaction.

This needs to remain within a transaction as CIC holds a session lock
on the parent table, which would not be cleaned up without a
transaction context.
--
Michael

#26Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#23)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-16, Alvaro Herrera wrote:

On 2020-Nov-09, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times. I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.

Do we really need exclusive lock on the ProcArray to make this flag
change? That seems pretty bad from a concurrency standpoint.

BTW I now know that the reason for taking ProcArrayLock is not the
vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

... ah, but I realize now that this means that we can use shared lock
here, not exclusive, which is already an enormous improvement. That's
because ->pgxactoff can only be changed with exclusive lock held; so as
long as we hold shared, the array item cannot move.

#27Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#24)
4 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

So I made the change to set the statusFlags with only LW_SHARED -- both
in existing uses (see 0002) and in the new CIC code (0003). I don't
think 0002 is going to have a tremendous performance impact, because it
changes operations that are very infrequent. But even so, it would be
weird to leave code around that uses exclusive lock when we're going to
add new code that uses shared lock for the same thing.

I still left the REINDEX CONCURRENTLY support in split out in 0004; I
intend to get the first three patches pushed now, and look into 0004
again later.

Attachments:

v6-0001-indexcmds.c-reorder-function-prototypes.patchtext/x-diff; charset=us-asciiDownload
From 5e2af4e082df8f839bf257d933d70bb9645f6cfe Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 11:37:36 -0300
Subject: [PATCH v6 1/4] indexcmds.c: reorder function prototypes

... out of an overabundance of neatnikism, perhaps.
---
 src/backend/commands/indexcmds.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 35696f9f75..02c7a0c7e1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
 							  Oid *typeOidP,
@@ -87,13 +88,11 @@ static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 											Oid relId, Oid oldRelId, void *arg);
-static bool ReindexRelationConcurrently(Oid relationOid, int options);
-
+static void reindex_error_callback(void *args);
 static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
-static void reindex_error_callback(void *args);
+static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
-static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
-- 
2.20.1

v6-0002-relax-lock-level-for-setting-statusFlags.patchtext/x-diff; charset=us-asciiDownload
From 3e2778d553448c67e956bd385c4a94039933d3c6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 13:48:16 -0300
Subject: [PATCH v6 2/4] relax lock level for setting statusFlags

---
 src/backend/commands/vacuum.c             | 2 +-
 src/backend/replication/logical/logical.c | 2 +-
 src/backend/replication/slot.c            | 2 +-
 src/backend/storage/ipc/procarray.c       | 2 ++
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d89ba3031f..395e75f768 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1741,7 +1741,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		 * MyProc->xid/xmin, otherwise GetOldestNonRemovableTransactionId()
 		 * might appear to go backwards, which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
 			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1f4df7d70..4324e32656 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 */
 	if (!IsTransactionOrTransactionBlock())
 	{
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 		MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 5ed955ba0b..9c7cf13d4d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -527,7 +527,7 @@ ReplicationSlotRelease(void)
 	MyReplicationSlot = NULL;
 
 	/* might not have been set when we've been a plain slot */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e2cb6e990d..ebe91907d6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -662,6 +662,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		/* avoid unnecessarily dirtying shared cachelines */
 		if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 		{
+			/* Only safe to change my own flags with only share lock */
+			Assert(proc == MyProc);
 			Assert(!LWLockHeldByMe(ProcArrayLock));
 			LWLockAcquire(ProcArrayLock, LW_SHARED);
 			Assert(proc->statusFlags == ProcGlobal->statusFlags[proc->pgxactoff]);
-- 
2.20.1

v6-0003-CREATE-INDEX-CONCURRENTLY-don-t-wait-for-its-kin.patchtext/x-diff; charset=iso-8859-1Download
From 5578f8824e9e9baccd9a3e1ad0b970bb120e4793 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 11:41:19 -0300
Subject: [PATCH v6 3/4] CREATE INDEX CONCURRENTLY: don't wait for its kin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the various waiting phases of CREATE INDEX CONCURRENTLY, we wait
for other processes to release their snapshots; in general terms this is
necessary for correctness.  However, processes doing CIC for other
tables cannot possibly affect CIC done on "this" table, so we don't need
to wait for *those*.  This commit adds a flag in MyProc->statusFlags to
indicate that the current process is doing CIC, so that other processes
doing CIC can ignore it when waiting.

This flag can potentially also be used by processes doing REINDEX
CONCURRENTLY also to avoid waiting, and by VACUUM to ignore the process
in CIC for the purposes of computing an Xmin.  That's left for future
commits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Dimitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql
---
 src/backend/commands/indexcmds.c | 55 ++++++++++++++++++++++++++++++--
 src/include/storage/proc.h       |  5 ++-
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 02c7a0c7e1..e4acdf1e1f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -93,6 +93,7 @@ static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
 static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
+static inline void set_safe_index_flag(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -384,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -405,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_IC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -425,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_IC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -518,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1044,6 +1051,10 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/* Determine whether we can call set_safe_index_flag */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1430,6 +1441,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1489,6 +1504,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1545,6 +1564,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
@@ -3896,3 +3919,29 @@ update_relispartition(Oid relationId, bool newval)
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
+
+/*
+ * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
+ *
+ * When doing concurrent index builds, we can set this flag
+ * to tell other processes concurrently running CREATE
+ * INDEX CONCURRENTLY to ignore us when
+ * doing their waits for concurrent snapshots.  On one hand it
+ * avoids pointlessly waiting for a process that's not interesting
+ * anyway, but more importantly it avoids deadlocks in some cases.
+ *
+ * This can only be done for indexes that don't execute any expressions.
+ * Caller is responsible for only calling this routine when that
+ * assumption holds true.
+ *
+ * (The flag is reset automatically at transaction end, so it must be
+ * set for each transaction.)
+ */
+static inline void
+set_safe_index_flag(void)
+{
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	MyProc->statusFlags |= PROC_IN_SAFE_IC;
+	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+	LWLockRelease(ProcArrayLock);
+}
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1067f58f51..2673ac4c37 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,16 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.20.1

v6-0004-Support-safe-flag-in-REINDEX-CONCURRENTLY.patchtext/x-diff; charset=us-asciiDownload
From 14e9e62716e71dc446b53b9dcfa954df3f0ec600 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 11:42:22 -0300
Subject: [PATCH v6 4/4] Support safe flag in REINDEX CONCURRENTLY

---
 src/backend/commands/indexcmds.c | 27 +++++++++++++++++++++++++--
 src/include/storage/proc.h       |  1 +
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e4acdf1e1f..7bf641a2bd 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -3043,6 +3043,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
 	};
 	int64		progress_vals[4];
+	bool		all_indexes_safe = true;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3347,6 +3348,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/* consider safety of this index for set_safe_index_flag */
+		if (all_indexes_safe &&
+			(newIndexRel->rd_indexprs != NIL ||
+			 newIndexRel->rd_indpred != NIL))
+			all_indexes_safe = false;
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
@@ -3416,6 +3423,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3479,6 +3490,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	}
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3632,6 +3647,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3664,6 +3683,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_safe_index_flag();
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3925,7 +3948,7 @@ update_relispartition(Oid relationId, bool newval)
  *
  * When doing concurrent index builds, we can set this flag
  * to tell other processes concurrently running CREATE
- * INDEX CONCURRENTLY to ignore us when
+ * INDEX CONCURRENTLY and REINDEX CONCURRENTLY to ignore us when
  * doing their waits for concurrent snapshots.  On one hand it
  * avoids pointlessly waiting for a process that's not interesting
  * anyway, but more importantly it avoids deadlocks in some cases.
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 2673ac4c37..b2347ffd79 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -54,6 +54,7 @@ struct XidCache
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
 #define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
 										 * CONCURRENTLY on non-expressional,
 										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
-- 
2.20.1

#28Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#27)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1f4df7d70..4324e32656 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
*/
if (!IsTransactionOrTransactionBlock())
{
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);

We don't really document that it is safe to update statusFlags while
holding a shared lock on ProcArrayLock, right? Could this be
clarified at least in proc.h?

+	/* Determine whether we can call set_safe_index_flag */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;

This should tell why we check after expressions and predicates, in
short giving an explanation about the snapshot issues with build and
validation when executing those expressions and/or predicates.

+ * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
+ *
+ * When doing concurrent index builds, we can set this flag
+ * to tell other processes concurrently running CREATE
+ * INDEX CONCURRENTLY to ignore us when
+ * doing their waits for concurrent snapshots.  On one hand it
+ * avoids pointlessly waiting for a process that's not interesting
+ * anyway, but more importantly it avoids deadlocks in some cases.
+ *
+ * This can only be done for indexes that don't execute any expressions.
+ * Caller is responsible for only calling this routine when that
+ * assumption holds true.
+ *
+ * (The flag is reset automatically at transaction end, so it must be
+ * set for each transaction.)

Would it be better to document that this function had better be called
after beginning a transaction? I am wondering if we should not
enforce some conditions here, say this one or similar:
Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

+ */
+static inline void
+set_safe_index_flag(void)

This routine name is rather close to index_set_state_flags(), could it
be possible to finish with a better name?
--
Michael

#29Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#28)
2 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-18, Michael Paquier wrote:

On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1f4df7d70..4324e32656 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
*/
if (!IsTransactionOrTransactionBlock())
{
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);

We don't really document that it is safe to update statusFlags while
holding a shared lock on ProcArrayLock, right? Could this be
clarified at least in proc.h?

Pushed that part with a comment addition. This stuff is completely
undocumented ...

+	/* Determine whether we can call set_safe_index_flag */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;

This should tell why we check after expressions and predicates, in
short giving an explanation about the snapshot issues with build and
validation when executing those expressions and/or predicates.

Fair. It seems good to add a comment to the new function, and reference
that in each place where we set the flag.

+ * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.

Would it be better to document that this function had better be called
after beginning a transaction? I am wondering if we should not
enforce some conditions here, say this one or similar:
Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

I went with checking MyProc->xid and MyProc->xmin, which is the same in
practice but notionally closer to what we're doing.

+static inline void
+set_safe_index_flag(void)

This routine name is rather close to index_set_state_flags(), could it
be possible to finish with a better name?

I went with set_indexsafe_procflags(). Ugly ...

Attachments:

v7-0001-CREATE-INDEX-CONCURRENTLY-don-t-wait-for-its-kin.patchtext/x-diff; charset=iso-8859-1Download
From a1c9fb94a71f0bf9ec492e1715809315e38084b3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 11:41:19 -0300
Subject: [PATCH v7 1/2] CREATE INDEX CONCURRENTLY: don't wait for its kin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the various waiting phases of CREATE INDEX CONCURRENTLY, we wait
for other processes to release their snapshots; in general terms this is
necessary for correctness.  However, processes doing CIC for other
tables cannot possibly affect CIC done on "this" table, so we don't need
to wait for *those*.  This commit adds a flag in MyProc->statusFlags to
indicate that the current process is doing CIC, so that other processes
doing CIC can ignore it when waiting.

This flag can potentially also be used by processes doing REINDEX
CONCURRENTLY also to avoid waiting, and by VACUUM to ignore the process
in CIC for the purposes of computing an Xmin.  That's left for future
commits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Dimitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql
---
 src/backend/commands/indexcmds.c | 63 ++++++++++++++++++++++++++++++--
 src/include/storage/proc.h       |  5 ++-
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 02c7a0c7e1..9efb6b5420 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -93,6 +93,7 @@ static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
 static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
+static inline void set_indexsafe_procflags(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -384,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -405,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+										  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+										  | PROC_IN_SAFE_IC,
 										  &n_old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -425,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 													true, false,
-													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+													PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+													| PROC_IN_SAFE_IC,
 													&n_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -518,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1044,6 +1051,10 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/* Is index safe for others to ignore?  See set_indexsafe_procflags() */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1430,6 +1441,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1489,6 +1504,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1545,6 +1564,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
@@ -3896,3 +3919,37 @@ update_relispartition(Oid relationId, bool newval)
 	heap_freetuple(tup);
 	table_close(classRel, RowExclusiveLock);
 }
+
+/*
+ * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
+ *
+ * When doing concurrent index builds, we can set this flag
+ * to tell other processes concurrently running CREATE
+ * INDEX CONCURRENTLY to ignore us when
+ * doing their waits for concurrent snapshots.  On one hand it
+ * avoids pointlessly waiting for a process that's not interesting
+ * anyway; but more importantly it avoids deadlocks in some cases.
+ *
+ * This can only be done safely for indexes that don't execute any
+ * expressions that could access other tables, so index must not be
+ * expressional nor partial.  Caller is responsible for only calling
+ * this routine when that assumption holds true.
+ *
+ * (The flag is reset automatically at transaction end, so it must be
+ * set for each transaction.)
+ */
+static inline void
+set_indexsafe_procflags(void)
+{
+	/*
+	 * This should only be called before installing xid or xmin in MyProc;
+	 * otherwise, concurrent processes could see an Xmin that moves backwards.
+	 */
+	Assert(MyProc->xid == InvalidTransactionId &&
+		   MyProc->xmin == InvalidTransactionId);
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	MyProc->statusFlags |= PROC_IN_SAFE_IC;
+	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+	LWLockRelease(ProcArrayLock);
+}
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 00bb244340..e75f6e8178 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -53,13 +53,16 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY on non-expressional,
+										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
 												 * decoding outside xact */
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
-- 
2.20.1

v7-0002-Support-safe-flag-in-REINDEX-CONCURRENTLY.patchtext/x-diff; charset=us-asciiDownload
From 127f6fd2b6c9a631da402531459bc06c86edad22 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 11:42:22 -0300
Subject: [PATCH v7 2/2] Support safe flag in REINDEX CONCURRENTLY

---
 src/backend/commands/indexcmds.c | 27 +++++++++++++++++++++++++--
 src/include/storage/proc.h       |  1 +
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9efb6b5420..98ea2f477e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -3043,6 +3043,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
 	};
 	int64		progress_vals[4];
+	bool		all_indexes_safe = true;
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -3347,6 +3348,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/* consider safety of this index for set_indexsafe_procflags */
+		if (all_indexes_safe &&
+			(newIndexRel->rd_indexprs != NIL ||
+			 newIndexRel->rd_indpred != NIL))
+			all_indexes_safe = false;
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
@@ -3416,6 +3423,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_indexsafe_procflags();
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3479,6 +3490,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	}
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_indexsafe_procflags();
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3632,6 +3647,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_indexsafe_procflags();
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3664,6 +3683,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (all_indexes_safe)
+		set_indexsafe_procflags();
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3925,7 +3948,7 @@ update_relispartition(Oid relationId, bool newval)
  *
  * When doing concurrent index builds, we can set this flag
  * to tell other processes concurrently running CREATE
- * INDEX CONCURRENTLY to ignore us when
+ * INDEX CONCURRENTLY and REINDEX CONCURRENTLY to ignore us when
  * doing their waits for concurrent snapshots.  On one hand it
  * avoids pointlessly waiting for a process that's not interesting
  * anyway; but more importantly it avoids deadlocks in some cases.
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index e75f6e8178..532493bccc 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -54,6 +54,7 @@ struct XidCache
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
 #define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
 										 * CONCURRENTLY on non-expressional,
 										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
-- 
2.20.1

#30Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#26)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Hi,

On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:

... ah, but I realize now that this means that we can use shared lock
here, not exclusive, which is already an enormous improvement. That's
because ->pgxactoff can only be changed with exclusive lock held; so as
long as we hold shared, the array item cannot move.

Uh, wait a second. The acquisition of this lock hasn't been affected by
the snapshot scalability changes, and therefore are unrelated to
->pgxactoff changing or not.

In 13 this is:
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
if (params->is_wraparound)
MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
LWLockRelease(ProcArrayLock);

Lowering this to a shared lock doesn't seem right, at least without a
detailed comment explaining why it's safe. Because GetSnapshotData() etc
look at all procs with just an LW_SHARED ProcArrayLock, changing
vacuumFlags without a lock means that two concurrent horizon
computations could come to a different result.

I'm not saying it's definitely wrong to relax things here, but I'm not
sure we've evaluated it sufficiently.

Greetings,

Andres Freund

#31Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#30)
1 attachment(s)
"as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:

... ah, but I realize now that this means that we can use shared lock
here, not exclusive, which is already an enormous improvement. That's
because ->pgxactoff can only be changed with exclusive lock held; so as
long as we hold shared, the array item cannot move.

Uh, wait a second. The acquisition of this lock hasn't been affected by
the snapshot scalability changes, and therefore are unrelated to
->pgxactoff changing or not.

I'm writing a response trying to thoroughly analyze this, but I also
wanted to report that ProcSleep is being a bit generous with what it
calls "as quickly as possible" here:

LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

/*
* Only do it if the worker is not working to protect against Xid
* wraparound.
*/
statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
if ((statusFlags & PROC_IS_AUTOVACUUM) &&
!(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
{
int pid = autovac->pid;
StringInfoData locktagbuf;
StringInfoData logbuf; /* errdetail for server log */

initStringInfo(&locktagbuf);
initStringInfo(&logbuf);
DescribeLockTag(&locktagbuf, &lock->tag);
appendStringInfo(&logbuf,
_("Process %d waits for %s on %s."),
MyProcPid,
GetLockmodeName(lock->tag.locktag_lockmethodid,
lockmode),
locktagbuf.data);

/* release lock as quickly as possible */
LWLockRelease(ProcArrayLock);

The amount of stuff that this is doing with ProcArrayLock held
exclusively seems a bit excessive; it sounds like we could copy the
values we need first, release the lock, and *then* do all that memory
allocation and string printing -- it's a lock of code for such a
contended lock. Anytime a process sees itself as blocked by autovacuum
and wants to signal it, there's a ProcArrayLock hiccup (I didn't
actually measure it, but it's at least five function calls). We could
make this more concurrent by copying lock->tag to a local variable,
releasing the lock, then doing all the string formatting and printing.
See attached quickly.patch.

Now, when this code was written (d7318d43d, 2012), this was a LOG
message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it
would be fair to ... remove the message? Or go back to Simon's original
formulation from commit acac68b2bca, which had this message as DEBUG2
without any string formatting.

Thoughts?

Attachments:

quickly.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d1738c65f5..caefff41e2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1325,20 +1325,22 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				int			pid = autovac->pid;
 				StringInfoData locktagbuf;
 				StringInfoData logbuf;	/* errdetail for server log */
+				LOCKTAG		locktag_copy;
+
+				/* release lock as quickly as possible */
+				locktag_copy = lock->tag;
+				LWLockRelease(ProcArrayLock);
 
 				initStringInfo(&locktagbuf);
 				initStringInfo(&logbuf);
-				DescribeLockTag(&locktagbuf, &lock->tag);
+				DescribeLockTag(&locktagbuf, &locktag_copy);
 				appendStringInfo(&logbuf,
 								 _("Process %d waits for %s on %s."),
 								 MyProcPid,
-								 GetLockmodeName(lock->tag.locktag_lockmethodid,
+								 GetLockmodeName(locktag_copy.locktag_lockmethodid,
 												 lockmode),
 								 locktagbuf.data);
 
-				/* release lock as quickly as possible */
-				LWLockRelease(ProcArrayLock);
-
 				/* send the autovacuum worker Back to Old Kent Road */
 				ereport(DEBUG1,
 						(errmsg("sending cancel to blocking autovacuum PID %d",
#32Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#31)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Hi,

On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:

The amount of stuff that this is doing with ProcArrayLock held
exclusively seems a bit excessive; it sounds like we could copy the
values we need first, release the lock, and *then* do all that memory
allocation and string printing -- it's a lock of code for such a
contended lock.

Yea, that's a good point.

Anytime a process sees itself as blocked by autovacuum
and wants to signal it, there's a ProcArrayLock hiccup (I didn't
actually measure it, but it's at least five function calls).

I'm a bit doubtful it's that important - it's limited in frequency
by deadlock_timeout. But worth improving anyway.

We could make this more concurrent by copying lock->tag to a local
variable, releasing the lock, then doing all the string formatting and
printing. See attached quickly.patch.

Sounds like a plan.

Now, when this code was written (d7318d43d, 2012), this was a LOG
message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it
would be fair to ... remove the message? Or go back to Simon's original
formulation from commit acac68b2bca, which had this message as DEBUG2
without any string formatting.

I don't really have an opinion on this.

Greetings,

Andres Freund

#33Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#30)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Wed, Nov 18, 2020 at 11:09:28AM -0800, Andres Freund wrote:

Uh, wait a second. The acquisition of this lock hasn't been affected by
the snapshot scalability changes, and therefore are unrelated to
->pgxactoff changing or not.

In 13 this is:
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
if (params->is_wraparound)
MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
LWLockRelease(ProcArrayLock);

Lowering this to a shared lock doesn't seem right, at least without a
detailed comment explaining why it's safe. Because GetSnapshotData() etc
look at all procs with just an LW_SHARED ProcArrayLock, changing
vacuumFlags without a lock means that two concurrent horizon
computations could come to a different result.

I'm not saying it's definitely wrong to relax things here, but I'm not
sure we've evaluated it sufficiently.

Yeah. While I do like the new assertion that 27838981 has added in
ProcArrayEndTransactionInternal(), this commit feels a bit rushed to
me. Echoing with my comment from upthread, I am not sure that we
still document enough why a shared lock would be completely fine in
the case of statusFlags. We have no hints that this could be fine
before 2783898, and 2783898 does not make that look better. FWIW, I
think that just using LW_EXCLUSIVE for the CIC change would have been
fine enough first, after evaluating the performance impact. We could
evaluate if it is possible to lower the ProcArrayLock acquisition in
those code paths on a separate thread.
--
Michael

#34Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#32)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On Wed, Nov 18, 2020 at 02:48:40PM -0800, Andres Freund wrote:

On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:

We could make this more concurrent by copying lock->tag to a local
variable, releasing the lock, then doing all the string formatting and
printing. See attached quickly.patch.

Sounds like a plan.

+1.

Now, when this code was written (d7318d43d, 2012), this was a LOG
message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it
would be fair to ... remove the message? Or go back to Simon's original
formulation from commit acac68b2bca, which had this message as DEBUG2
without any string formatting.

I don't really have an opinion on this.

That still looks useful for debugging, so DEBUG1 sounds fine to me.
--
Michael

#35Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#34)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:

That still looks useful for debugging, so DEBUG1 sounds fine to me.

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.
--
Michael

#36Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#30)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-18, Andres Freund wrote:

In 13 this is:
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
if (params->is_wraparound)
MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
LWLockRelease(ProcArrayLock);

Lowering this to a shared lock doesn't seem right, at least without a
detailed comment explaining why it's safe. Because GetSnapshotData() etc
look at all procs with just an LW_SHARED ProcArrayLock, changing
vacuumFlags without a lock means that two concurrent horizon
computations could come to a different result.

I'm not saying it's definitely wrong to relax things here, but I'm not
sure we've evaluated it sufficiently.

True. Let's evaluate it.

I changed three spots where statusFlags bits are written:

a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING
b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING
c) vacuum_rel: sets PROC_IN_VACUUM and potentially
PROC_VACUUM_FOR_WRAPAROUND

Who reads these flags?

PROC_IN_LOGICAL_DECODING is read by:
* ComputeXidHorizons
* GetSnapshotData

PROC_IN_VACUUM is read by:
* GetCurrentVirtualXIDs
* ComputeXidHorizons
* GetSnapshotData
* ProcArrayInstallImportedXmin

PROC_VACUUM_FOR_WRAPAROUND is read by:
* ProcSleep

ProcSleep: no bug here.
The flags are checked to see if we should kill() the process (used when
autovac blocks some other process). The lock here appears to be
inconsequential, since we release it before we do kill(); so strictly
speaking, there is still a race condition where the autovac process
could reset the flag after we read it and before we get a chance to
signal it. The lock level autovac uses to set the flag is not relevant
either.

ProcArrayInstallImportedXmin:
This one is just searching for a matching backend; not affected by the
flags.

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation. Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests. I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.) But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

Therefore it seems to me that this code has a bug independently of the
lock level used.

GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags. The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*. In other words, if there's no Xid/Xmin, then the flag is not
important. So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock. But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

GetSnapshotData has an additional difficulty -- we do the
UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit.
So it's not valid to set the bit without locking out GSD, regardless of
any barriers we had; if we want this to be safe, we'd have to change
this so that the flag is read first, and we read the xid only
afterwards, with a read barrier.

I *think* we could relax the lock if we had a write barrier in
between: set the flag first, issue a write barrier, set the Xid.
(I have to admit I'm confused about what needs to happen in the read
case: read the bit first, potentially skip the PGPROC entry; but can we
read the Xid ahead of reading the flag, and if we do read the xid
afterwards, do we need a read barrier in between?)
Given this uncertainty, I'm not proposing to relax the lock from
exclusive to shared.

I didn't get around to reading ComputeXidHorizons, but it seems like
it'd have the same problem as GSD.

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#36)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation. Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests. I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.) But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

Therefore it seems to me that this code has a bug independently of the
lock level used.

That is only a bug if the flags are *cleared* in a way that's not
atomic with clearing the transaction's xid/xmin, no? I agree that
once set, the flag had better stay set till transaction end, but
that's not what's at stake here.

GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags. The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*. In other words, if there's no Xid/Xmin, then the flag is not
important. So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock. But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

I don't buy this either. You get the same result if someone looks just
before you take the ProcArrayLock to set the flag. So if there's a
problem, it's inherent in the way that the flags are defined or used;
the strength of lock used in this stanza won't affect it.

regards, tom lane

#38Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#35)
1 attachment(s)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-19, Michael Paquier wrote:

On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:

That still looks useful for debugging, so DEBUG1 sounds fine to me.

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.

Good idea, done.

I also noticed that if we're going to accept a race (which BTW already
exists) we may as well simplify the code about it.

I think the attached is the final form of this.

Attachments:

0001-Don-t-hold-ProcArrayLock-longer-than-needed-in-rare-.patchtext/x-diff; charset=us-asciiDownload
From cab95d8813cd79adb7b2a1b5501714c2dd87bec2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 17 Nov 2020 21:09:22 -0300
Subject: [PATCH] Don't hold ProcArrayLock longer than needed in rare cases

While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
---
 src/backend/storage/lmgr/proc.c | 63 ++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d1738c65f5..dbfdd7b3c7 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1311,40 +1311,58 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		{
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 			uint8		statusFlags;
+			uint8		lockmethod_copy;
+			LOCKTAG		locktag_copy;
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			/*
+			 * Grab info we need, then release lock immediately.  Note this
+			 * coding means that there is a tiny chance that the process
+			 * terminates its current transaction and starts a different one
+			 * before we have a change to send the signal; the worst possible
+			 * consequence is that a for-wraparound vacuum is cancelled.  But
+			 * that could happen in any case unless we were to do kill() with
+			 * the lock held, which is much more undesirable.
+			 */
+			LWLockAcquire(ProcArrayLock, LW_SHARED);
+			statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
+			lockmethod_copy = lock->tag.locktag_lockmethodid;
+			locktag_copy = lock->tag;
+			LWLockRelease(ProcArrayLock);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
 			 * wraparound.
 			 */
-			statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
 			if ((statusFlags & PROC_IS_AUTOVACUUM) &&
 				!(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
 			{
 				int			pid = autovac->pid;
-				StringInfoData locktagbuf;
-				StringInfoData logbuf;	/* errdetail for server log */
 
-				initStringInfo(&locktagbuf);
-				initStringInfo(&logbuf);
-				DescribeLockTag(&locktagbuf, &lock->tag);
-				appendStringInfo(&logbuf,
-								 _("Process %d waits for %s on %s."),
-								 MyProcPid,
-								 GetLockmodeName(lock->tag.locktag_lockmethodid,
-												 lockmode),
-								 locktagbuf.data);
+				/* report the case, if configured to do so */
+				if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+				{
+					StringInfoData locktagbuf;
+					StringInfoData logbuf;	/* errdetail for server log */
 
-				/* release lock as quickly as possible */
-				LWLockRelease(ProcArrayLock);
+					initStringInfo(&locktagbuf);
+					initStringInfo(&logbuf);
+					DescribeLockTag(&locktagbuf, &locktag_copy);
+					appendStringInfo(&logbuf,
+									 _("Process %d waits for %s on %s."),
+									 MyProcPid,
+									 GetLockmodeName(lockmethod_copy, lockmode),
+									 locktagbuf.data);
+
+					ereport(DEBUG1,
+							(errmsg("sending cancel to blocking autovacuum PID %d",
+									pid),
+							 errdetail_log("%s", logbuf.data)));
+
+					pfree(logbuf.data);
+					pfree(locktagbuf.data);
+				}
 
 				/* send the autovacuum worker Back to Old Kent Road */
-				ereport(DEBUG1,
-						(errmsg("sending cancel to blocking autovacuum PID %d",
-								pid),
-						 errdetail_log("%s", logbuf.data)));
-
 				if (kill(pid, SIGINT) < 0)
 				{
 					/*
@@ -1362,12 +1380,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 								(errmsg("could not send signal to process %d: %m",
 										pid)));
 				}
-
-				pfree(logbuf.data);
-				pfree(locktagbuf.data);
 			}
-			else
-				LWLockRelease(ProcArrayLock);
 
 			/* prevent signal from being sent again more than once */
 			allow_autovacuum_cancel = false;
-- 
2.20.1

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#38)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Nov-19, Michael Paquier wrote:

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.

Good idea, done.

I'm less sure that that's a good idea. It embeds knowledge here that
should not exist outside elog.c; moreover, I'm not entirely sure that
it's even correct, given the nonlinear ranking of log_min_messages.

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

regards, tom lane

#40Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#39)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Nov-19, Michael Paquier wrote:

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.

Good idea, done.

I'm less sure that that's a good idea. It embeds knowledge here that
should not exist outside elog.c; moreover, I'm not entirely sure that
it's even correct, given the nonlinear ranking of log_min_messages.

Well, we already do this in a number of places. But I can get behind
this:

Show quoted text

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#40)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Well, we already do this in a number of places. But I can get behind
this:

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

I'll see about a patch for that.

regards, tom lane

#42Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#41)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Well, we already do this in a number of places. But I can get behind
this:

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

I'll see about a patch for that.

Looking at that now ...

#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#42)
1 attachment(s)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Here's a draft patch.

regards, tom lane

Attachments:

hide-elog-level-knowledge.patchtext/x-diff; charset=us-ascii; name=hide-elog-level-knowledge.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..9cd0b7c11b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5344,7 +5344,7 @@ static void
 ShowTransactionState(const char *str)
 {
 	/* skip work if message will definitely not be printed */
-	if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5)
+	if (message_level_is_interesting(DEBUG5))
 		ShowTransactionStateRec(str, CurrentTransactionState);
 }
 
@@ -5371,7 +5371,6 @@ ShowTransactionStateRec(const char *str, TransactionState s)
 	if (s->parent)
 		ShowTransactionStateRec(str, s->parent);
 
-	/* use ereport to suppress computation if msg will not be printed */
 	ereport(DEBUG5,
 			(errmsg_internal("%s(%d) name: %s; blockState: %s; state: %s, xid/subid/cid: %u/%u/%u%s%s",
 							 str, s->nestingLevel,
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7e915bcadf..32a3099c1f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 * tracing of the cause (note the elog context mechanism will tell us
 	 * something about the XLOG record that generated the reference).
 	 */
-	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+	if (message_level_is_interesting(DEBUG1))
 		report_invalid_page(DEBUG1, node, forkno, blkno, present);
 
 	if (invalid_page_tab == NULL)
@@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 			hentry->key.forkno == forkno &&
 			hentry->key.blkno >= minblkno)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 				char	   *path = relpathperm(hentry->key.node, forkno);
 
@@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid)
 	{
 		if (hentry->key.node.dbNode == dbid)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 				char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..245c2f4fc8 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1146,15 +1146,9 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
 	 * If no error is to be thrown, and the msglevel is too low to be shown to
 	 * either client or server log, there's no need to do any of the rest of
 	 * the work.
-	 *
-	 * Note: this code doesn't know all there is to be known about elog
-	 * levels, but it works for NOTICE and DEBUG2, which are the only values
-	 * msglevel can currently have.  We also assume we are running in a normal
-	 * operating environment.
 	 */
 	if (behavior == DROP_CASCADE &&
-		msglevel < client_min_messages &&
-		(msglevel < log_min_messages || log_min_messages == LOG))
+		!message_level_is_interesting(msglevel))
 		return;
 
 	/*
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index babee386c4..87c3ea450e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	walrcv->lastMsgReceiptTime = lastMsgReceiptTime;
 	SpinLockRelease(&walrcv->mutex);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *sendtime;
 		char	   *receipttime;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5d1b1a16be..2eb19ad293 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void)
 	replyTime = pq_getmsgint64(&reply_message);
 	replyRequested = pq_getmsgbyte(&reply_message);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
@@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void)
 	feedbackCatalogXmin = pq_getmsgint(&reply_message, 4);
 	feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..04c2245338 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -600,6 +600,42 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	CHECK_FOR_INTERRUPTS();
 }
 
+/*
+ * message_level_is_interesting --- would ereport/elog do anything?
+ *
+ * Returns true if ereport/elog with this elevel will not be a no-op.
+ * This is useful to short-circuit any expensive preparatory work that
+ * might be needed for a logging message.  There is no point in
+ * prepending this to a bare ereport/elog call, however.
+ */
+bool
+message_level_is_interesting(int elevel)
+{
+	bool		output_to_server;
+	bool		output_to_client;
+
+	/*
+	 * Keep this in sync with the decision-making in errstart().
+	 */
+	if (elevel >= ERROR)
+		return true;
+
+	output_to_server = is_log_level_output(elevel, log_min_messages);
+	if (output_to_server)
+		return true;
+
+	if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY &&
+		!ClientAuthInProgress)
+	{
+		output_to_client = (elevel >= client_min_messages ||
+							elevel == INFO);
+		if (output_to_client)
+			return true;
+	}
+
+	return false;
+}
+
 
 /*
  * errcode --- add SQLSTATE error code to the current error
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 1e09ee0541..de5bba0a4a 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -149,6 +149,8 @@
 extern bool errstart(int elevel, const char *domain);
 extern void errfinish(const char *filename, int lineno, const char *funcname);
 
+extern bool message_level_is_interesting(int elevel);
+
 extern int	errcode(int sqlerrcode);
 
 extern int	errcode_for_file_access(void);
#44Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#43)
1 attachment(s)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Tom Lane wrote:

Here's a draft patch.

Here's another of my own. Outside of elog.c it seems identical.

Attachments:

interesting-msg.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..a4ab9090f9 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5344,7 +5344,7 @@ static void
 ShowTransactionState(const char *str)
 {
 	/* skip work if message will definitely not be printed */
-	if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5)
+	if (message_level_is_interesting(DEBUG5))
 		ShowTransactionStateRec(str, CurrentTransactionState);
 }
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7e915bcadf..32a3099c1f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 * tracing of the cause (note the elog context mechanism will tell us
 	 * something about the XLOG record that generated the reference).
 	 */
-	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+	if (message_level_is_interesting(DEBUG1))
 		report_invalid_page(DEBUG1, node, forkno, blkno, present);
 
 	if (invalid_page_tab == NULL)
@@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 			hentry->key.forkno == forkno &&
 			hentry->key.blkno >= minblkno)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 				char	   *path = relpathperm(hentry->key.node, forkno);
 
@@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid)
 	{
 		if (hentry->key.node.dbNode == dbid)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 				char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..0ad647ed9c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1146,15 +1146,8 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
 	 * If no error is to be thrown, and the msglevel is too low to be shown to
 	 * either client or server log, there's no need to do any of the rest of
 	 * the work.
-	 *
-	 * Note: this code doesn't know all there is to be known about elog
-	 * levels, but it works for NOTICE and DEBUG2, which are the only values
-	 * msglevel can currently have.  We also assume we are running in a normal
-	 * operating environment.
 	 */
-	if (behavior == DROP_CASCADE &&
-		msglevel < client_min_messages &&
-		(msglevel < log_min_messages || log_min_messages == LOG))
+	if (behavior == DROP_CASCADE && !message_level_is_interesting(msglevel))
 		return;
 
 	/*
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index babee386c4..87c3ea450e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	walrcv->lastMsgReceiptTime = lastMsgReceiptTime;
 	SpinLockRelease(&walrcv->mutex);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *sendtime;
 		char	   *receipttime;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5d1b1a16be..2eb19ad293 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void)
 	replyTime = pq_getmsgint64(&reply_message);
 	replyRequested = pq_getmsgbyte(&reply_message);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
@@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void)
 	feedbackCatalogXmin = pq_getmsgint(&reply_message, 4);
 	feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 40eac704dc..eb2309ed76 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1341,22 +1341,25 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 				StringInfoData logbuf;	/* errdetail for server log */
 
 				/* report the case, if configured to do so */
-				initStringInfo(&locktagbuf);
-				initStringInfo(&logbuf);
-				DescribeLockTag(&locktagbuf, &locktag_copy);
-				appendStringInfo(&logbuf,
-								 _("Process %d waits for %s on %s."),
-								 MyProcPid,
-								 GetLockmodeName(lockmethod_copy, lockmode),
-								 locktagbuf.data);
+				if (message_level_is_interesting(DEBUG1))
+				{
+					initStringInfo(&locktagbuf);
+					initStringInfo(&logbuf);
+					DescribeLockTag(&locktagbuf, &locktag_copy);
+					appendStringInfo(&logbuf,
+									 _("Process %d waits for %s on %s."),
+									 MyProcPid,
+									 GetLockmodeName(lockmethod_copy, lockmode),
+									 locktagbuf.data);
 
-				ereport(DEBUG1,
-						(errmsg("sending cancel to blocking autovacuum PID %d",
-								pid),
-						 errdetail_log("%s", logbuf.data)));
+					ereport(DEBUG1,
+							(errmsg("sending cancel to blocking autovacuum PID %d",
+									pid),
+							 errdetail_log("%s", logbuf.data)));
 
-				pfree(logbuf.data);
-				pfree(locktagbuf.data);
+					pfree(logbuf.data);
+					pfree(locktagbuf.data);
+				}
 
 				/* send the autovacuum worker Back to Old Kent Road */
 				if (kill(pid, SIGINT) < 0)
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..aab27aa5aa 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -185,6 +185,7 @@ static void send_message_to_frontend(ErrorData *edata);
 static const char *error_severity(int elevel);
 static void append_with_tabs(StringInfo buf, const char *str);
 static bool is_log_level_output(int elevel, int log_min_level);
+static bool is_log_level_output_client(int elevel, int client_min_level);
 
 
 /*
@@ -293,20 +294,7 @@ errstart(int elevel, const char *domain)
 	output_to_server = is_log_level_output(elevel, log_min_messages);
 
 	/* Determine whether message is enabled for client output */
-	if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY)
-	{
-		/*
-		 * client_min_messages is honored only after we complete the
-		 * authentication handshake.  This is required both for security
-		 * reasons and because many clients can't handle NOTICE messages
-		 * during authentication.
-		 */
-		if (ClientAuthInProgress)
-			output_to_client = (elevel >= ERROR);
-		else
-			output_to_client = (elevel >= client_min_messages ||
-								elevel == INFO);
-	}
+	output_to_client = is_log_level_output_client(elevel, client_min_messages);
 
 	/* Skip processing effort if non-error message will not be output */
 	if (elevel < ERROR && !output_to_server && !output_to_client)
@@ -3521,6 +3509,42 @@ is_log_level_output(int elevel, int log_min_level)
 	return false;
 }
 
+/*
+ * is_log_level_output_client -- is elevel logical >= client_min_level?
+ */
+static bool
+is_log_level_output_client(int elevel, int client_min_level)
+{
+	if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY)
+	{
+		/*
+		 * client_min_level (client_min_messages) is honored only after we
+		 * complete the authentication handshake.  This is required both for
+		 * security reasons and because many clients can't handle NOTICE
+		 * messages during authentication.
+		 */
+		if (ClientAuthInProgress)
+			return elevel >= ERROR;
+		else
+			return elevel >= client_min_level || elevel == INFO;
+	}
+
+	return false;
+}
+
+/*
+ * Return whether a message of the given elevel would be printed.
+ *
+ * This can be used to skip formatting effort outside of ereport() calls.
+ */
+bool
+message_level_is_interesting(int elevel)
+{
+	return
+		is_log_level_output(elevel, log_min_messages) ||
+		is_log_level_output_client(elevel, client_min_messages);
+}
+
 /*
  * Adjust the level of a recovery-related message per trace_recovery_messages.
  *
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 1e09ee0541..4049456eab 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -393,6 +393,9 @@ extern void pg_re_throw(void) pg_attribute_noreturn();
 
 extern char *GetErrorContextStack(void);
 
+/* allow skipping formatting messages that won't be emitted */
+extern bool message_level_is_interesting(int elevel);
+
 /* Hook for intercepting messages before they are sent to the server log */
 typedef void (*emit_log_hook_type) (ErrorData *edata);
 extern PGDLLIMPORT emit_log_hook_type emit_log_hook;
#45Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#44)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Alvaro Herrera wrote:

On 2020-Nov-23, Tom Lane wrote:

Here's a draft patch.

Here's another of my own. Outside of elog.c it seems identical.

Your version has the advantage that errstart() doesn't get a new
function call. I'm +1 for going with that ... we could avoid the
duplicate code with some additional contortions but this changes so
rarely that it's probably not worth the trouble.

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#44)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Nov-23, Tom Lane wrote:

Here's a draft patch.

Here's another of my own. Outside of elog.c it seems identical.

Ah, I see I didn't cover the case in ProcSleep that you were originally on
about ... I'd just looked for existing references to log_min_messages
and client_min_messages.

I think it's important to have the explicit check for elevel >= ERROR.
I'm not too fussed about whether we invent is_log_level_output_client,
although that name doesn't seem well-chosen compared to
is_log_level_output.

Shall I press forward with this, or do you want to?

regards, tom lane

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#45)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Your version has the advantage that errstart() doesn't get a new
function call. I'm +1 for going with that ... we could avoid the
duplicate code with some additional contortions but this changes so
rarely that it's probably not worth the trouble.

I was considering adding that factorization, but marking the function
inline to avoid adding overhead. Most of elog.c predates our use of
inline, so it wasn't considered when this code was written.

regards, tom lane

#48Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#46)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Tom Lane wrote:

Ah, I see I didn't cover the case in ProcSleep that you were originally on
about ... I'd just looked for existing references to log_min_messages
and client_min_messages.

Yeah, it seemed bad form to add that when you had just argued against it
:-)

I think it's important to have the explicit check for elevel >= ERROR.
I'm not too fussed about whether we invent is_log_level_output_client,
although that name doesn't seem well-chosen compared to
is_log_level_output.

Just replacing "log" for "client" in that seemed strictly worse, and I
didn't (don't) have any other ideas.

Shall I press forward with this, or do you want to?

Please feel free to go ahead, including the change to ProcSleep.

#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#48)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Nov-23, Tom Lane wrote:

I'm not too fussed about whether we invent is_log_level_output_client,
although that name doesn't seem well-chosen compared to
is_log_level_output.

Just replacing "log" for "client" in that seemed strictly worse, and I
didn't (don't) have any other ideas.

I never cared that much for "is_log_level_output" either. Thinking
about renaming it to "should_output_to_log()", and then the new function
would be "should_output_to_client()".

Shall I press forward with this, or do you want to?

Please feel free to go ahead, including the change to ProcSleep.

Will do.

regards, tom lane

#50Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#36)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Hi,

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

ProcSleep: no bug here.
The flags are checked to see if we should kill() the process (used when
autovac blocks some other process). The lock here appears to be
inconsequential, since we release it before we do kill(); so strictly
speaking, there is still a race condition where the autovac process
could reset the flag after we read it and before we get a chance to
signal it. The lock level autovac uses to set the flag is not relevant
either.

Yea. Even before the recent changes building the lock message under the
lock didn't make sense. Now that the flags are mirrored in pgproc, we
probably should just make this use READ_ONCE(autovac->statusFlags),
without any other use of ProcArrayLock. As you say the race condition
is between the flag test, the kill, and the signal being processed,
which wasn't previously protected either.

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation. Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests. I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.) But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
set when outside a transaction block, i.e. walsender. In which case
there shouldn't be an xid/xmin, I think? Or did you gate your assert on
PROC_IN_LOGICAL_DECODING being set?

GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags. The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*. In other words, if there's no Xid/Xmin, then the flag is not
important. So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock.

That'd require at least memory barriers in GetSnapshotData()'s loop,
which I'd really like to avoid. Otherwise the order in which memory gets
written in one process doesn't guarantee the order of visibility in
another process...

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

Cool.

GetSnapshotData has an additional difficulty -- we do the
UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit.
So it's not valid to set the bit without locking out GSD, regardless of
any barriers we had; if we want this to be safe, we'd have to change
this so that the flag is read first, and we read the xid only
afterwards, with a read barrier.

Which we don't want, because that'd mean slowing down the *extremely*
common case of the majority of backends neither having an xid assigned
nor doing logical decoding / vacuum. We'd be reading two cachelines
instead of one.

Greetings,

Andres Freund

#51Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#49)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On 2020-Nov-23, Tom Lane wrote:

I never cared that much for "is_log_level_output" either. Thinking
about renaming it to "should_output_to_log()", and then the new function
would be "should_output_to_client()".

Ah, that sounds nicely symmetric and grammatical.

Thanks!

#52Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#49)
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

On Mon, Nov 23, 2020 at 06:13:17PM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Please feel free to go ahead, including the change to ProcSleep.

Will do.

Thank you both for 450c823 and 789b938.
--
Michael

#53Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#50)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-23, Andres Freund wrote:

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation. Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests. I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.) But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
set when outside a transaction block, i.e. walsender. In which case
there shouldn't be an xid/xmin, I think? Or did you gate your assert on
PROC_IN_LOGICAL_DECODING being set?

Ah, you're right about this one -- I missed the significance of setting
the flag only "when outside of a transaction block" at the time we call
StartupDecodingContext.

#54Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#37)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-23, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags. The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*. In other words, if there's no Xid/Xmin, then the flag is not
important. So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock. But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

I don't buy this either. You get the same result if someone looks just
before you take the ProcArrayLock to set the flag. So if there's a
problem, it's inherent in the way that the flags are defined or used;
the strength of lock used in this stanza won't affect it.

The problem is that the writes could be reordered in a way that makes
the Xid appear set to an onlooker before PROC_IN_VACUUM appears set.
Vacuum always sets the bit first, and *then* the xid. If the reader
always reads it like that then it's not a problem. But in order to
guarantee that, we would have to have a read barrier for each pass
through the loop.

With the LW_EXCLUSIVE lock, we block the readers so that the bit is
known set by the time they examine it. As I understand, getting the
lock is itself a barrier, so there's no danger that we'll set the bit
and they won't see it.

... at least, that how I *imagine* the argument to be. In practice,
vacuum_rel() calls GetSnapshotData() before installing the
PROC_IN_VACUUM bit, and therefore there *is* a risk that reader 1 will
get MyProc->xmin included in their snapshot (because bit not yet set),
and reader 2 won't. If my understanding is correct, then we should move
the PushActiveSnapshot(GetTransactionSnapshot()) call to after we have
the PROC_IN_VACUUM bit set.

#55Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#50)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-23, Andres Freund wrote:

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

Cool.

Here's a patch.

Note it also moves the computation of vacuum's Xmin (per
GetTransactionSnapshot) to *after* the bit has been set in statusFlags.

Attachments:

0001-Restore-lock-level-to-update-statusFlags.patchtext/x-diff; charset=us-asciiDownload
From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 24 Nov 2020 18:10:42 -0300
Subject: [PATCH] Restore lock level to update statusFlags

Reverts 27838981be9d (some comments are kept).  Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu43a@alap3.anarazel.de
---
 src/backend/commands/vacuum.c             | 20 +++++++++++---------
 src/backend/replication/logical/logical.c |  2 +-
 src/backend/storage/ipc/procarray.c       |  4 +---
 src/include/storage/proc.h                |  6 +++---
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 395e75f768..f1112111de 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1712,13 +1712,6 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/* Begin a transaction for vacuuming this relation */
 	StartTransactionCommand();
 
-	/*
-	 * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
-	 * cutoff xids in local memory wrapping around, and to have updated xmin
-	 * horizons.
-	 */
-	PushActiveSnapshot(GetTransactionSnapshot());
-
 	if (!(params->options & VACOPT_FULL))
 	{
 		/*
@@ -1739,9 +1732,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		 * Note: these flags remain set until CommitTransaction or
 		 * AbortTransaction.  We don't want to clear them until we reset
 		 * MyProc->xid/xmin, otherwise GetOldestNonRemovableTransactionId()
-		 * might appear to go backwards, which is probably Not Good.
+		 * might appear to go backwards, which is probably Not Good.  (We also
+		 * set PROC_IN_VACUUM *before* taking our own snapshot, so that our
+		 * xmin doesn't become visible ahead of setting the flag.)
 		 */
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
 			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
@@ -1749,6 +1744,13 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		LWLockRelease(ProcArrayLock);
 	}
 
+	/*
+	 * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
+	 * cutoff xids in local memory wrapping around, and to have updated xmin
+	 * horizons.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Check for user-requested abort.  Note we want this to be inside a
 	 * transaction, so xact.c doesn't issue useless WARNING.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4324e32656..f1f4df7d70 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 */
 	if (!IsTransactionOrTransactionBlock())
 	{
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 		MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94edb24b22..c7848c0b69 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -662,10 +662,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		/* avoid unnecessarily dirtying shared cachelines */
 		if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 		{
-			/* Only safe to change my own flags with just share lock */
-			Assert(proc == MyProc);
 			Assert(!LWLockHeldByMe(ProcArrayLock));
-			LWLockAcquire(ProcArrayLock, LW_SHARED);
+			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 			Assert(proc->statusFlags == ProcGlobal->statusFlags[proc->pgxactoff]);
 			proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
 			ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 00bb244340..22046e4e36 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -99,9 +99,9 @@ typedef enum
  * but its myProcLocks[] lists are valid.
  *
  * We allow many fields of this struct to be accessed without locks, such as
- * statusFlags or delayChkpt. However, keep in mind that writing mirrored ones
- * (see below) requires holding ProcArrayLock or XidGenLock in at least shared
- * mode, so that pgxactoff does not change concurrently.
+ * delayChkpt and isBackgroundWorker. However, keep in mind that writing
+ * mirrored ones (see below) requires holding ProcArrayLock or XidGenLock in
+ * at least shared mode, so that pgxactoff does not change concurrently.
  *
  * Mirrored fields:
  *
-- 
2.20.1

#56Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#55)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-25, Alvaro Herrera wrote:

On 2020-Nov-23, Andres Freund wrote:

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

Cool.

Here's a patch.

Pushed, thanks.

#57Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dmitry Dolgov (#10)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

So let's discuss the next step in this series: what to do about REINDEX
CONCURRENTLY.

I started with Dmitry's patch (an updated version of which I already
posted in [1]/messages/by-id/20201118175804.GA23027@alvherre.pgsql). However, Dmitry missed (and I hadn't noticed) that some
of the per-index loops are starting transactions also, and that we need
to set the flag in those. And what's more, in a couple of the
function-scope transactions we do set the flag pointlessly: the
transactions there do not acquire a snapshot, so there's no reason to
set the flag at all, because WaitForOlderSnapshots ignores sessions
whose Xmin is 0.

There are also transactions that wait first, without setting a snapshot,
and then do some catalog manipulations. I think it's prett much useless
to set the flag for those, because they're going to be very short
anyway. (There's also one case of this in CREATE INDEX CONCURRENTLY.)

But there's a more interesting point also. In Dmitry's patch, we
determine safety for *all* indexes being processed as a set, and then
apply the flag only if they're all deemed safe. But we can optimize
this, and set the flag for each index' transaction individually, and
only skip it for those specific indexes that are unsafe. So I propose
to change the data structure used in ReindexRelationConcurrently from
the current list of OIDs to a list of (oid,boolean) pairs, to be able to
track setting the flag individually.

There's one more useful observation: in the function-scope transactions
(outside the per-index loops), we don't touch the contents of any
indexes; we just wait or do some catalog manipulation. So we can set
the flag *regardless of the safety of any indexes*. We only need to
care about the safety of the indexes in the phases where we build the
indexes and when we validate them.

[1]: /messages/by-id/20201118175804.GA23027@alvherre.pgsql

#58Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#57)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2020-Nov-26, Alvaro Herrera wrote:

So let's discuss the next step in this series: what to do about REINDEX
CONCURRENTLY.

[...]

... as in the attached.

Attachments:

reindex-concurrently.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ca24620fd0..9c1c0aad3e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -1564,9 +1564,11 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
-	/* Tell concurrent index builds to ignore us, if index qualifies */
-	if (safe_index)
-		set_indexsafe_procflags();
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
 
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
@@ -3020,6 +3022,13 @@ ReindexMultipleInternal(List *relids, int options)
  * concerns, so there's no need for the more complicated concurrent build,
  * anyway, and a non-concurrent reindex is more efficient.
  */
+
+typedef struct reindex_idx
+{
+	Oid		indexId;
+	bool	safe;
+} reindex_idx;
+
 static bool
 ReindexRelationConcurrently(Oid relationOid, int options)
 {
@@ -3132,10 +3141,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 										get_rel_name(cellOid))));
 					else
 					{
+						reindex_idx	   *idx;
+
 						/* Save the list of relation OIDs in private context */
 						oldcontext = MemoryContextSwitchTo(private_context);
 
-						indexIds = lappend_oid(indexIds, cellOid);
+						idx = palloc(sizeof(reindex_idx));
+						idx->indexId = cellOid;
+
+						indexIds = lappend(indexIds, idx);
 
 						MemoryContextSwitchTo(oldcontext);
 					}
@@ -3172,13 +3186,17 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 											get_rel_name(cellOid))));
 						else
 						{
+							reindex_idx	   *idx;
+
 							/*
 							 * Save the list of relation OIDs in private
 							 * context
 							 */
 							oldcontext = MemoryContextSwitchTo(private_context);
 
-							indexIds = lappend_oid(indexIds, cellOid);
+							idx = palloc(sizeof(reindex_idx));
+							idx->indexId = cellOid;
+							indexIds = lappend(indexIds, idx);
 
 							MemoryContextSwitchTo(oldcontext);
 						}
@@ -3197,6 +3215,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				Oid			heapId = IndexGetRelation(relationOid,
 													  (options & REINDEXOPT_MISSING_OK) != 0);
 				Relation	heapRelation;
+				reindex_idx *idx;
 
 				/* if relation is missing, leave */
 				if (!OidIsValid(heapId))
@@ -3247,7 +3266,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				 * Save the list of relation OIDs in private context.  Note
 				 * that invalid indexes are allowed here.
 				 */
-				indexIds = lappend_oid(indexIds, relationOid);
+				idx = palloc(sizeof(reindex_idx));
+				idx->indexId = relationOid;
+				indexIds = lappend(indexIds, idx);
 
 				MemoryContextSwitchTo(oldcontext);
 				break;
@@ -3306,8 +3327,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
-		Oid			indexId = lfirst_oid(lc);
+		reindex_idx *index = lfirst(lc);
+		Oid			indexId = index->indexId;
 		Oid			newIndexId;
+		reindex_idx *newidx;
 		Relation	indexRel;
 		Relation	heapRel;
 		Relation	newIndexRel;
@@ -3347,12 +3370,20 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/* determine safety of this index for set_indexsafe_procflags */
+		index->safe = (newIndexRel->rd_indexprs == NIL &&
+					   newIndexRel->rd_indpred == NIL);
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
 		oldcontext = MemoryContextSwitchTo(private_context);
 
-		newIndexIds = lappend_oid(newIndexIds, newIndexId);
+		newidx = palloc(sizeof(reindex_idx));
+		newidx->indexId = newIndexId;
+		newidx->safe = index->safe;
+
+		newIndexIds = lappend(newIndexIds, newidx);
 
 		/*
 		 * Save lockrelid to protect each relation from drop then close
@@ -3416,6 +3447,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no
+	 * need to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3434,7 +3470,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	foreach(lc, newIndexIds)
 	{
 		Relation	newIndexRel;
-		Oid			newIndexId = lfirst_oid(lc);
+		reindex_idx *idx = lfirst(lc);
+		Oid			newIndexId = idx->indexId;
 		Oid			heapId;
 		Oid			indexam;
 
@@ -3448,6 +3485,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (idx->safe)
+			set_indexsafe_procflags();
+
 		/* Set ActiveSnapshot since functions in the indexes may need it */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -3477,8 +3518,14 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no
+	 * need to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3494,7 +3541,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	foreach(lc, newIndexIds)
 	{
-		Oid			newIndexId = lfirst_oid(lc);
+		reindex_idx *idx = lfirst(lc);
+		Oid			newIndexId = idx->indexId;
 		Oid			heapId;
 		TransactionId limitXmin;
 		Snapshot	snapshot;
@@ -3510,6 +3558,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (idx->safe)
+			set_indexsafe_procflags();
+
 		/*
 		 * Take the "reference snapshot" that will be used by validate_index()
 		 * to filter candidate tuples.
@@ -3552,6 +3604,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 * To ensure no deadlocks, we must commit and start yet another
 		 * transaction, and do our wait before any snapshot has been taken in
 		 * it.
+		 *
+		 * Because we don't take a snapshot in this transaction, there's no
+		 * need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		CommitTransactionCommand();
 		StartTransactionCommand();
@@ -3582,11 +3637,20 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	StartTransactionCommand();
 
+	/*
+	 * Because this transaction only does catalog manipulations and doesn't
+	 * do any index operations, we can set the PROC_IN_SAFE_IC flag here
+	 * unconditionally.
+	 */
+	set_indexsafe_procflags();
+
 	forboth(lc, indexIds, lc2, newIndexIds)
 	{
 		char	   *oldName;
-		Oid			oldIndexId = lfirst_oid(lc);
-		Oid			newIndexId = lfirst_oid(lc2);
+		reindex_idx *oldidx = lfirst(lc);
+		reindex_idx *newidx = lfirst(lc2);
+		Oid			oldIndexId = oldidx->indexId;
+		Oid			newIndexId = newidx->indexId;
 		Oid			heapId;
 
 		/*
@@ -3632,6 +3696,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3646,7 +3716,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	foreach(lc, indexIds)
 	{
-		Oid			oldIndexId = lfirst_oid(lc);
+		reindex_idx *idx = lfirst(lc);
+		Oid			oldIndexId = idx->indexId;
 		Oid			heapId;
 
 		/*
@@ -3664,6 +3735,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after all
+	 * the waiting has been completed.
+	 */
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3681,7 +3758,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 		foreach(lc, indexIds)
 		{
-			Oid			oldIndexId = lfirst_oid(lc);
+			reindex_idx *idx = lfirst(lc);
+			Oid			oldIndexId = idx->indexId;
 			ObjectAddress object;
 
 			object.classId = RelationRelationId;
@@ -3728,7 +3806,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		{
 			foreach(lc, newIndexIds)
 			{
-				Oid			indOid = lfirst_oid(lc);
+				reindex_idx *idx = lfirst(lc);
+				Oid			indOid = idx->indexId;
 
 				ereport(INFO,
 						(errmsg("index \"%s.%s\" was reindexed",
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index e77f76ae8a..e1a6bc5170 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -54,6 +54,7 @@ struct XidCache
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
 #define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
 										 * CONCURRENTLY on non-expressional,
 										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
#59Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#58)
1 attachment(s)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Actually, I noticed two things. The first of them, addressed in this
new version of the patch, is that REINDEX CONCURRENTLY is doing a lot of
repetitive work by reopening each index and table in the build/validate
loops, so that they can report progress. This is easy to remedy by
adding a couple more members to the new struct (which I also renamed to
ReindexIndexInfo), for tableId and amId. The code seems a bit simpler
this way.

The other thing is that ReindexRelationConcurrenty seems to always be
called with the relations already locked by RangeVarGetRelidExtended.
So claiming to acquire locks on the relations over and over is
pointless. (I only noticed this because there was an obvious deadlock
hazard in one of the loops, that locked index before table.) I think we
should reduce all those to NoLock. My patch does not do that.

Attachments:

reindex-concurrently-2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ca24620fd0..a52cb16b5e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -114,6 +114,17 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+/*
+ * Index to process in ReindexRelationConcurrently
+ */
+typedef struct ReindexIndexInfo
+{
+	Oid			indexId;
+	Oid			tableId;
+	Oid			amId;
+	bool		safe;			/* for set_indexsafe_procflags */
+} ReindexIndexInfo;
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -385,7 +396,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -1564,9 +1575,11 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
-	/* Tell concurrent index builds to ignore us, if index qualifies */
-	if (safe_index)
-		set_indexsafe_procflags();
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
 
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
@@ -3132,10 +3145,16 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 										get_rel_name(cellOid))));
 					else
 					{
+						ReindexIndexInfo *idx;
+
 						/* Save the list of relation OIDs in private context */
 						oldcontext = MemoryContextSwitchTo(private_context);
 
-						indexIds = lappend_oid(indexIds, cellOid);
+						idx = palloc(sizeof(ReindexIndexInfo));
+						idx->indexId = cellOid;
+						/* other fields set later */
+
+						indexIds = lappend(indexIds, idx);
 
 						MemoryContextSwitchTo(oldcontext);
 					}
@@ -3172,13 +3191,18 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 											get_rel_name(cellOid))));
 						else
 						{
+							ReindexIndexInfo *idx;
+
 							/*
 							 * Save the list of relation OIDs in private
 							 * context
 							 */
 							oldcontext = MemoryContextSwitchTo(private_context);
 
-							indexIds = lappend_oid(indexIds, cellOid);
+							idx = palloc(sizeof(ReindexIndexInfo));
+							idx->indexId = cellOid;
+							indexIds = lappend(indexIds, idx);
+							/* other fields set later */
 
 							MemoryContextSwitchTo(oldcontext);
 						}
@@ -3197,6 +3221,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				Oid			heapId = IndexGetRelation(relationOid,
 													  (options & REINDEXOPT_MISSING_OK) != 0);
 				Relation	heapRelation;
+				ReindexIndexInfo *idx;
 
 				/* if relation is missing, leave */
 				if (!OidIsValid(heapId))
@@ -3247,7 +3272,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				 * Save the list of relation OIDs in private context.  Note
 				 * that invalid indexes are allowed here.
 				 */
-				indexIds = lappend_oid(indexIds, relationOid);
+				idx = palloc(sizeof(ReindexIndexInfo));
+				idx->indexId = relationOid;
+				indexIds = lappend(indexIds, idx);
+				/* other fields set later */
 
 				MemoryContextSwitchTo(oldcontext);
 				break;
@@ -3306,31 +3334,40 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
-		Oid			indexId = lfirst_oid(lc);
+		ReindexIndexInfo *idx = lfirst(lc);
+		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Relation	newIndexRel;
 		LockRelId  *lockrelid;
 
-		indexRel = index_open(indexId, ShareUpdateExclusiveLock);
+		/* XXX pointless lock acquisition */
+		indexRel = index_open(idx->indexId, ShareUpdateExclusiveLock);
 		heapRel = table_open(indexRel->rd_index->indrelid,
 							 ShareUpdateExclusiveLock);
 
+		/* determine safety of this index for set_indexsafe_procflags */
+		idx->safe = (indexRel->rd_indexprs == NIL &&
+					 indexRel->rd_indpred == NIL);
+		idx->tableId = RelationGetRelid(heapRel);
+		idx->amId = indexRel->rd_rel->relam;
+
 		/* This function shouldn't be called for temporary relations. */
 		if (indexRel->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
 			elog(ERROR, "cannot reindex a temporary table concurrently");
 
 		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
-									  RelationGetRelid(heapRel));
+									  idx->tableId);
+
 		progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
 		progress_vals[1] = 0;	/* initializing */
-		progress_vals[2] = indexId;
-		progress_vals[3] = indexRel->rd_rel->relam;
+		progress_vals[2] = idx->indexId;
+		progress_vals[3] = idx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
 		/* Choose a temporary relation name for the new index */
-		concurrentName = ChooseRelationName(get_rel_name(indexId),
+		concurrentName = ChooseRelationName(get_rel_name(idx->indexId),
 											NULL,
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
@@ -3338,7 +3375,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 		/* Create new index definition based on given index */
 		newIndexId = index_concurrently_create_copy(heapRel,
-													indexId,
+													idx->indexId,
 													concurrentName);
 
 		/*
@@ -3352,7 +3389,13 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		oldcontext = MemoryContextSwitchTo(private_context);
 
-		newIndexIds = lappend_oid(newIndexIds, newIndexId);
+		newidx = palloc(sizeof(ReindexIndexInfo));
+		newidx->indexId = newIndexId;
+		newidx->safe = idx->safe;
+		newidx->tableId = idx->tableId;
+		newidx->amId = idx->amId;
+
+		newIndexIds = lappend(newIndexIds, newidx);
 
 		/*
 		 * Save lockrelid to protect each relation from drop then close
@@ -3380,6 +3423,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	 */
 	foreach(lc, heapRelationIds)
 	{
+		/* XXX pointless lock acquisition */
 		Relation	heapRelation = table_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
 		LockRelId  *lockrelid;
 		LOCKTAG    *heaplocktag;
@@ -3416,6 +3460,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3433,10 +3482,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	foreach(lc, newIndexIds)
 	{
-		Relation	newIndexRel;
-		Oid			newIndexId = lfirst_oid(lc);
-		Oid			heapId;
-		Oid			indexam;
+		ReindexIndexInfo *newidx = lfirst(lc);
 
 		/* Start new transaction for this index's concurrent build */
 		StartTransactionCommand();
@@ -3448,37 +3494,38 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
 		/* Set ActiveSnapshot since functions in the indexes may need it */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		/*
-		 * Index relation has been closed by previous commit, so reopen it to
-		 * get its information.
-		 */
-		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
-		heapId = newIndexRel->rd_index->indrelid;
-		indexam = newIndexRel->rd_rel->relam;
-		index_close(newIndexRel, NoLock);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
 		 */
-		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId);
+		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, newidx->tableId);
 		progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
 		progress_vals[1] = PROGRESS_CREATEIDX_PHASE_BUILD;
-		progress_vals[2] = newIndexId;
-		progress_vals[3] = indexam;
+		progress_vals[2] = newidx->indexId;
+		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
 		/* Perform concurrent build of new index */
-		index_concurrently_build(heapId, newIndexId);
+		index_concurrently_build(newidx->tableId, newidx->indexId);
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3494,12 +3541,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	foreach(lc, newIndexIds)
 	{
-		Oid			newIndexId = lfirst_oid(lc);
-		Oid			heapId;
+		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
 		Snapshot	snapshot;
-		Relation	newIndexRel;
-		Oid			indexam;
 
 		StartTransactionCommand();
 
@@ -3510,6 +3554,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
 		/*
 		 * Take the "reference snapshot" that will be used by validate_index()
 		 * to filter candidate tuples.
@@ -3517,27 +3565,19 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		snapshot = RegisterSnapshot(GetTransactionSnapshot());
 		PushActiveSnapshot(snapshot);
 
-		/*
-		 * Index relation has been closed by previous commit, so reopen it to
-		 * get its information.
-		 */
-		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
-		heapId = newIndexRel->rd_index->indrelid;
-		indexam = newIndexRel->rd_rel->relam;
-		index_close(newIndexRel, NoLock);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
 		 */
-		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId);
+		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
+									  newidx->tableId);
 		progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
 		progress_vals[1] = PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN;
-		progress_vals[2] = newIndexId;
-		progress_vals[3] = indexam;
+		progress_vals[2] = newidx->indexId;
+		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(heapId, newIndexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -3552,6 +3592,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 * To ensure no deadlocks, we must commit and start yet another
 		 * transaction, and do our wait before any snapshot has been taken in
 		 * it.
+		 *
+		 * Because we don't take a snapshot in this transaction, there's no
+		 * need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		CommitTransactionCommand();
 		StartTransactionCommand();
@@ -3582,12 +3625,18 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	StartTransactionCommand();
 
+	/*
+	 * Because this transaction only does catalog manipulations and doesn't do
+	 * any index operations, we can set the PROC_IN_SAFE_IC flag here
+	 * unconditionally.
+	 */
+	set_indexsafe_procflags();
+
 	forboth(lc, indexIds, lc2, newIndexIds)
 	{
+		ReindexIndexInfo *oldidx = lfirst(lc);
+		ReindexIndexInfo *newidx = lfirst(lc2);
 		char	   *oldName;
-		Oid			oldIndexId = lfirst_oid(lc);
-		Oid			newIndexId = lfirst_oid(lc2);
-		Oid			heapId;
 
 		/*
 		 * Check for user-requested abort.  This is inside a transaction so as
@@ -3596,27 +3645,25 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
-		heapId = IndexGetRelation(oldIndexId, false);
-
 		/* Choose a relation name for old index */
-		oldName = ChooseRelationName(get_rel_name(oldIndexId),
+		oldName = ChooseRelationName(get_rel_name(oldidx->indexId),
 									 NULL,
 									 "ccold",
-									 get_rel_namespace(heapId),
+									 get_rel_namespace(oldidx->tableId),
 									 false);
 
 		/*
 		 * Swap old index with the new one.  This also marks the new one as
 		 * valid and the old one as not valid.
 		 */
-		index_concurrently_swap(newIndexId, oldIndexId, oldName);
+		index_concurrently_swap(newidx->indexId, oldidx->indexId, oldName);
 
 		/*
 		 * Invalidate the relcache for the table, so that after this commit
 		 * all sessions will refresh any cached plans that might reference the
 		 * index.
 		 */
-		CacheInvalidateRelcacheByRelid(heapId);
+		CacheInvalidateRelcacheByRelid(oldidx->tableId);
 
 		/*
 		 * CCI here so that subsequent iterations see the oldName in the
@@ -3632,6 +3679,12 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
+
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
@@ -3646,8 +3699,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 	foreach(lc, indexIds)
 	{
-		Oid			oldIndexId = lfirst_oid(lc);
-		Oid			heapId;
+		ReindexIndexInfo *oldidx = lfirst(lc);
 
 		/*
 		 * Check for user-requested abort.  This is inside a transaction so as
@@ -3656,14 +3708,19 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
-		heapId = IndexGetRelation(oldIndexId, false);
-		index_concurrently_set_dead(heapId, oldIndexId);
+		index_concurrently_set_dead(oldidx->tableId, oldidx->indexId);
 	}
 
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after all
+	 * the waiting has been completed.
+	 */
+
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
@@ -3681,11 +3738,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 		foreach(lc, indexIds)
 		{
-			Oid			oldIndexId = lfirst_oid(lc);
+			ReindexIndexInfo *idx = lfirst(lc);
 			ObjectAddress object;
 
 			object.classId = RelationRelationId;
-			object.objectId = oldIndexId;
+			object.objectId = idx->indexId;
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
@@ -3728,7 +3785,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		{
 			foreach(lc, newIndexIds)
 			{
-				Oid			indOid = lfirst_oid(lc);
+				ReindexIndexInfo *idx = lfirst(lc);
+				Oid			indOid = idx->indexId;
 
 				ereport(INFO,
 						(errmsg("index \"%s.%s\" was reindexed",
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index e77f76ae8a..e1a6bc5170 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -54,6 +54,7 @@ struct XidCache
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
 #define		PROC_IN_SAFE_IC		0x04	/* currently running CREATE INDEX
+										 * CONCURRENTLY or REINDEX
 										 * CONCURRENTLY on non-expressional,
 										 * non-partial index */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b8ca8cffd9..f2df682a9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2046,6 +2046,7 @@ Regis
 RegisNode
 RegisteredBgWorker
 ReindexErrorInfo
+ReindexIndexInfo
 ReindexObjectType
 ReindexStmt
 ReindexType
#60Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#59)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

In the interest of showing progress, I'm going to mark this CF item as
committed, and I'll submit the remaining pieces in a new thread.

Thanks!

#61Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#60)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On Mon, Nov 30, 2020 at 04:15:27PM -0300, Alvaro Herrera wrote:

In the interest of showing progress, I'm going to mark this CF item as
committed, and I'll submit the remaining pieces in a new thread.

Thanks for splitting!
--
Michael

#62Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#55)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

Hi,

On 2020-11-25 17:03:58 -0300, Alvaro Herrera wrote:

On 2020-Nov-23, Andres Freund wrote:

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

Cool.

Here's a patch.

Note it also moves the computation of vacuum's Xmin (per
GetTransactionSnapshot) to *after* the bit has been set in statusFlags.

From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 24 Nov 2020 18:10:42 -0300
Subject: [PATCH] Restore lock level to update statusFlags

Reverts 27838981be9d (some comments are kept). Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
level in ReplicationSlotRelease(). Was that intentional?

Greetings,

Andres Freund

#63Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#62)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2021-Nov-10, Andres Freund wrote:

Reverts 27838981be9d (some comments are kept). Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
level in ReplicationSlotRelease(). Was that intentional?

Hmm, no, that seems to have been a mistake. I'll restore it.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

#64Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#63)
Re: remove spurious CREATE INDEX CONCURRENTLY wait

On 2021-11-11 09:38:07 -0300, Alvaro Herrera wrote:

I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
level in ReplicationSlotRelease(). Was that intentional?

Hmm, no, that seems to have been a mistake. I'll restore it.

Thanks