Inadequate executor locking of indexes

Started by Tom Laneabout 7 years ago33 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I discovered that it's possible to trigger relation_open's new assertion
about having a lock on the relation by the simple expedient of running
the core regression tests with plan_cache_mode = force_generic_plan.
(This doesn't give me a terribly warm feeling about how thoroughly we've
exercised the relevant code, but I don't know what to do about that.)

The reason for the crash is that nodeIndexscan.c and friends all open
their index-to-scan with code like

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);

Now, at the time this code was written, InitPlan did indeed ensure that
indexes of a plan's result relation were already opened and locked,
because it calls InitResultRelInfo which used to call ExecOpenIndices.
Nowadays, the ExecOpenIndices part is postponed to ExecInitModifyTable,
which figures it can optimize matters:

/*
* If there are indices on the result relation, open them and save
* descriptors in the result relation info, so that we can add new
* index entries for the tuples we add/update. We need not do this
* for a DELETE, however, since deletion doesn't affect indexes. Also,
* inside an EvalPlanQual operation, the indexes might be open
* already, since we share the resultrel state with the original
* query.
*/
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

Therefore, in a plan consisting of a DELETE ModifyTable atop an indexscan
of the target table, we are opening the index without the executor
acquiring any lock on the index. The only thing that saves us from
triggering that Assert is that in most cases the planner has already
taken a lock on any index it considered (and not released that lock).
But using a prepared plan breaks that.

This problem is ancient; it's unrelated to the recent changes to reduce
executor locking, because that only concerned table locks not index locks.
I'm not certain how much real-world impact it has, since holding a lock
on the index's parent table is probably enough to prevent dire things
from happening in most cases. Still, it seems like a bug.

There are several things we might do to fix this:

1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
in ExecInitModifyTable. We might be forced into that someday anyway if
we want to support non-heap-style tables, since most other peoples'
versions of indexes do want to know about deletions.

2. Drop the target-table optimization from nodeIndexscan.c and friends,
and just always open the scan index with AccessShareLock. (BTW, it's
a bit inconsistent that these nodes don't release their index locks
at the end; ExecCloseIndices does.)

3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
exception, so that they get the lock for themselves in that case. This
seems pretty ugly/fragile, but it's about the only option that won't end
in adding index lock-acquisition overhead in some cases. (But given the
planner's behavior, it's not clear to me how often that really matters.)

BTW, another thing that is bothering me about this is that, both with
the current code and options #1 and #3, there's an assumption that any
indexscan on a query's target table must be underneath the relevant
ModifyTable plan node. Given some other plan tree structure it might
be possible to initialize the indexscan node before the ModifyTable,
breaking the assumption that the latter already got index locks. I'm
not sure if that's actually possible at present, but it doesn't seem
like I'd want to bet on it always being so in the future. This might
be an argument for going with option #2, which eliminates all such
assumptions.

Another idea is to make things work more like we just made them work for
tables, which is to ensure that all index locks are taken before reaching
the executor, or at least as part of some other processing than the plan
tree walk. That would be a good deal more work than any of the options
listed above, though, and it would definitely not be back-patchable if we
decide we need a back-patched fix.

Thoughts?

regards, tom lane

#2David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: Inadequate executor locking of indexes

On Thu, 8 Nov 2018 at 13:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I discovered that it's possible to trigger relation_open's new assertion
about having a lock on the relation by the simple expedient of running
the core regression tests with plan_cache_mode = force_generic_plan.

There are several things we might do to fix this:

1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
in ExecInitModifyTable. We might be forced into that someday anyway if
we want to support non-heap-style tables, since most other peoples'
versions of indexes do want to know about deletions.

2. Drop the target-table optimization from nodeIndexscan.c and friends,
and just always open the scan index with AccessShareLock. (BTW, it's
a bit inconsistent that these nodes don't release their index locks
at the end; ExecCloseIndices does.)

3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
exception, so that they get the lock for themselves in that case. This
seems pretty ugly/fragile, but it's about the only option that won't end
in adding index lock-acquisition overhead in some cases. (But given the
planner's behavior, it's not clear to me how often that really matters.)

Since I missed this and only discovered this was a problem when
someone else reported it today, and since I already did my own
analysis separately in [1]/messages/by-id/CAKJS1f-DyKTYyMf9oxn1PQ=WyEOOjfVcV-dCc6eB9eat_MYPeA@mail.gmail.com, then my vote is for #2. For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Ideally, the locking code would realise we already hold a stronger
lock and skip the lock, but I don't see how that's realistically
possible without probing the hash table for all stronger lock types
first, which would likely damage the performance of locking.

[1]: /messages/by-id/CAKJS1f-DyKTYyMf9oxn1PQ=WyEOOjfVcV-dCc6eB9eat_MYPeA@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: Inadequate executor locking of indexes

David Rowley <david.rowley@2ndquadrant.com> writes:

On Thu, 8 Nov 2018 at 13:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There are several things we might do to fix this:

1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
in ExecInitModifyTable. We might be forced into that someday anyway if
we want to support non-heap-style tables, since most other peoples'
versions of indexes do want to know about deletions.

2. Drop the target-table optimization from nodeIndexscan.c and friends,
and just always open the scan index with AccessShareLock. (BTW, it's
a bit inconsistent that these nodes don't release their index locks
at the end; ExecCloseIndices does.)

3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
exception, so that they get the lock for themselves in that case. This
seems pretty ugly/fragile, but it's about the only option that won't end
in adding index lock-acquisition overhead in some cases. (But given the
planner's behavior, it's not clear to me how often that really matters.)

Since I missed this and only discovered this was a problem when
someone else reported it today, and since I already did my own
analysis separately in [1], then my vote is for #2.

Thinking more about this, the problem I noted previously about two of
these solutions not working if the index scan node is not physically
underneath the ModifyTable node actually applies to all three :-(.
It's a slightly different issue for #2, namely that what we risk is
first taking AccessShareLock and then upgrading to RowExclusiveLock.
Since there are places (not many) that take ShareLock on indexes,
this would pose a deadlock risk.

Now, after more thought, I believe that it's probably impossible
to have a plan tree in which ExecRelationIsTargetRelation would
return true at an indexscan node that's not underneath the ModifyTable
node. What *is* possible is that we have such an indexscan on a
different RTE for the same table. I constructed this admittedly
artificial example in the regression database:

# explain with x1 as (select * from tenk1 t1 where unique1 = 42),
x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
select * from x1,x2 where x1.ten = x2.ten;
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (cost=16.61..16.66 rows=1 width=488)
Join Filter: (x1.ten = x2.ten)
CTE x1
-> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: (unique1 = 42)
CTE x2
-> Update on tenk1 t2 (cost=0.29..8.30 rows=1 width=250)
-> Index Scan using tenk1_unique2 on tenk1 t2 (cost=0.29..8.30 rows=1 width=250)
Index Cond: (unique2 = 11)
-> CTE Scan on x1 (cost=0.00..0.02 rows=1 width=244)
-> CTE Scan on x2 (cost=0.00..0.02 rows=1 width=244)
(11 rows)

Because the CTEs will be initialized in order, this presents a case
where the lock-upgrade hazard exists today: the x1 indexscan will
open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
node will upgrade that to RowExclusiveLock. None of the proposed
fixes improve this.

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Yeah, it'd be nice to get rid of the need for that.

regards, tom lane

#4David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Inadequate executor locking of indexes

On Sat, 24 Nov 2018 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Now, after more thought, I believe that it's probably impossible
to have a plan tree in which ExecRelationIsTargetRelation would
return true at an indexscan node that's not underneath the ModifyTable
node. What *is* possible is that we have such an indexscan on a
different RTE for the same table. I constructed this admittedly
artificial example in the regression database:

# explain with x1 as (select * from tenk1 t1 where unique1 = 42),
x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
select * from x1,x2 where x1.ten = x2.ten;

Well, that problem exists with more than just indexes. Relations could
be problematic too. An even more simple artificial example would be:

select * from t1 inner join t1 t2 on t1.a=t2.a for update of t2;

We could fix that in the executor by processing the rangetable in the
planner, first throwing the whole thing into a hash table by Oid and
finding the strongest lock level and applying that level to each
(non-dummy) matching RangeTblEntry's rellockmode. While we're there
we could add a new field for indexlockmode and do the same process on
that. However... there might not be much point as this does nothing
for the same problem that exists in parse analyze. That may be much
harder or even impossible to fix.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#3)
1 attachment(s)
Re: Inadequate executor locking of indexes

Sorry for jumping in late on this.

On 2018/11/24 1:25, Tom Lane wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:
Thinking more about this, the problem I noted previously about two of
these solutions not working if the index scan node is not physically
underneath the ModifyTable node actually applies to all three :-(.
It's a slightly different issue for #2, namely that what we risk is
first taking AccessShareLock and then upgrading to RowExclusiveLock.
Since there are places (not many) that take ShareLock on indexes,
this would pose a deadlock risk.

Now, after more thought, I believe that it's probably impossible
to have a plan tree in which ExecRelationIsTargetRelation would
return true at an indexscan node that's not underneath the ModifyTable
node. What *is* possible is that we have such an indexscan on a
different RTE for the same table. I constructed this admittedly
artificial example in the regression database:

# explain with x1 as (select * from tenk1 t1 where unique1 = 42),
x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
select * from x1,x2 where x1.ten = x2.ten;
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (cost=16.61..16.66 rows=1 width=488)
Join Filter: (x1.ten = x2.ten)
CTE x1
-> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: (unique1 = 42)
CTE x2
-> Update on tenk1 t2 (cost=0.29..8.30 rows=1 width=250)
-> Index Scan using tenk1_unique2 on tenk1 t2 (cost=0.29..8.30 rows=1 width=250)
Index Cond: (unique2 = 11)
-> CTE Scan on x1 (cost=0.00..0.02 rows=1 width=244)
-> CTE Scan on x2 (cost=0.00..0.02 rows=1 width=244)
(11 rows)

Because the CTEs will be initialized in order, this presents a case
where the lock-upgrade hazard exists today: the x1 indexscan will
open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
node will upgrade that to RowExclusiveLock. None of the proposed
fixes improve this.

Provided we want to keep ExecRelationIsTargetRelation going forward, how
about modifying it such that we compare the scan relation's OID with that
of the result relations, not the RT index? Like in the attached.

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

+1

Also as already proposed, InitPlan should lock result relation indexes
even for DELETEs.

For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Yeah, it'd be nice to get rid of the need for that.

As David mentioned elsewhere, can we add the ResultRelInfos to a hash
table if there are at least a certain number of result relations?
3f2393edefa did that for UPDATE tuple routing efficiency.

Thanks,
Amit

Attachments:

ExecRelationIsTargetRelation-compare-OID.patchtext/plain; charset=UTF-8; name=ExecRelationIsTargetRelation-compare-OID.patchDownload
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 2a47abc02e..b9ee875267 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -664,12 +664,17 @@ bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
 	ResultRelInfo *resultRelInfos;
+	Oid			scanreloid;
 	int			i;
 
+	Assert(estate->es_range_table_array != NULL &&
+		   estate->es_range_table_array[scanrelid - 1] != NULL);
+	scanreloid = estate->es_range_table_array[scanrelid - 1]->relid;
+
 	resultRelInfos = estate->es_result_relations;
 	for (i = 0; i < estate->es_num_result_relations; i++)
 	{
-		if (resultRelInfos[i].ri_RangeTableIndex == scanrelid)
+		if (scanreloid == resultRelInfos[i].ri_RelationDesc->rd_id)
 			return true;
 	}
 	return false;
#6David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#5)
Re: Inadequate executor locking of indexes

On Mon, 26 Nov 2018 at 17:37, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/11/24 1:25, Tom Lane wrote:

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

+1

Also as already proposed, InitPlan should lock result relation indexes
even for DELETEs.

I'd rather see it fixed another way. The reason being, if we get [1]https://commitfest.postgresql.org/20/1778/,
then that opens the door to run-time partition pruning for
UPDATE/DELETE, which is currently blocked due to lack of knowledge
about baserestrictinfos for the base partitioned relation because
inheritance_planner() does not plan for the partitioned table, only
its partitions. Doing the index opening work during InitPlan() means
we do that for all partitions, even the ones that will later be
run-time pruned. If we can doing it during init of a ModifyTable node,
then we can likely do it after the run-time pruning takes place.
Since Amit and I are both working towards making partitioning faster,
I imagined he would also not want to do something that could slow it
down significantly, if there was some alternative way to fix it, at
least.

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1]: https://commitfest.postgresql.org/20/1778/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#6)
Re: Inadequate executor locking of indexes

On 2018/11/26 14:25, David Rowley wrote:

On Mon, 26 Nov 2018 at 17:37, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/11/24 1:25, Tom Lane wrote:

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

+1

Also as already proposed, InitPlan should lock result relation indexes
even for DELETEs.

I'd rather see it fixed another way. The reason being, if we get [1],
then that opens the door to run-time partition pruning for
UPDATE/DELETE, which is currently blocked due to lack of knowledge
about baserestrictinfos for the base partitioned relation because
inheritance_planner() does not plan for the partitioned table, only
its partitions. Doing the index opening work during InitPlan() means
we do that for all partitions, even the ones that will later be
run-time pruned. If we can doing it during init of a ModifyTable node,
then we can likely do it after the run-time pruning takes place.
Since Amit and I are both working towards making partitioning faster,
I imagined he would also not want to do something that could slow it
down significantly, if there was some alternative way to fix it, at
least.

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1] https://commitfest.postgresql.org/20/1778/

That's an interesting point. Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations. For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

That said, I don't quite understand how the lock-strength upgrade
occurring in the way being discussed here (AccessShareLock for scanning to
RowExclusiveLock for modifying) leads to deadlock hazard?

Thanks,
Amit

#8David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#7)
Re: Inadequate executor locking of indexes

On Mon, 26 Nov 2018 at 18:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/11/26 14:25, David Rowley wrote:

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1] https://commitfest.postgresql.org/20/1778/

That's an interesting point. Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations. For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

I thought my idea of the processing the rangetable at the end of
planning to determine the strongest lock per relation would have
solved that.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#8)
Re: Inadequate executor locking of indexes

On 2018/11/27 6:19, David Rowley wrote:

On Mon, 26 Nov 2018 at 18:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/11/26 14:25, David Rowley wrote:

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1] https://commitfest.postgresql.org/20/1778/

That's an interesting point. Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations. For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

I thought my idea of the processing the rangetable at the end of
planning to determine the strongest lock per relation would have
solved that.

Yeah, would be nice to make that work.

Thanks,
Amit

#10David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#9)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Tue, 27 Nov 2018 at 19:00, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/11/27 6:19, David Rowley wrote:

On Mon, 26 Nov 2018 at 18:57, Amit Langote

That's an interesting point. Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations. For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

I thought my idea of the processing the rangetable at the end of
planning to determine the strongest lock per relation would have
solved that.

Yeah, would be nice to make that work.

Here's a very rough and incomplete patch just to demo what I had in
mind. The finalize_lockmodes() is likely more or less complete, just
minus me testing it works. What's mostly missing is changing all the
places that grab index locks to use the new idxlockmode field. I've
really just changed index scan and index only scan and just stared a
bit at nodeModifyTable.c wondering what I should do with that
operation != CMD_DELETE test before the ExecOpenIndices() call.

The patch also includes code to determine the strongest rellockmode
per relation. Perhaps it's not really worth doing that since parse
analyze could still cause some lock upgrade hazards. The code that's
there just fixes things for the executor, so really would only have an
effect when executing cached plans.

If this looks like a good path to go in, then I can produce something
a bit more finished. I'm just a bit unsure when exactly I can do that
as I'm on leave and have other commitments to take care of.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

wip_idxlockmode_and_lock_upgrade_fix.patchapplication/octet-stream; name=wip_idxlockmode_and_lock_upgrade_fix.patchDownload
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index d1201a1807..1318904135 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,7 +493,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -555,16 +555,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f209173a85..fa5b587769 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -920,7 +920,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -983,16 +983,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index db49968409..6c4547fcbd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2354,6 +2354,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(relid);
 	COPY_SCALAR_FIELD(relkind);
 	COPY_SCALAR_FIELD(rellockmode);
+	COPY_SCALAR_FIELD(idxlockmode);
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3a084b4d1f..a8dfc390b3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2630,6 +2630,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_SCALAR_FIELD(relid);
 	COMPARE_SCALAR_FIELD(relkind);
 	COMPARE_SCALAR_FIELD(rellockmode);
+	COMPARE_SCALAR_FIELD(idxlockmode);
 	COMPARE_NODE_FIELD(tablesample);
 	COMPARE_NODE_FIELD(subquery);
 	COMPARE_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f0c396530d..7ce6f4b976 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3127,6 +3127,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
 			WRITE_INT_FIELD(rellockmode);
+			WRITE_INT_FIELD(idxlockmode);
 			WRITE_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e117867de5..74895f256c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1362,6 +1362,7 @@ _readRangeTblEntry(void)
 			READ_OID_FIELD(relid);
 			READ_CHAR_FIELD(relkind);
 			READ_INT_FIELD(rellockmode);
+			READ_INT_FIELD(idxlockmode);
 			READ_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c729a99f8b..6369c2caa6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -122,6 +122,7 @@ typedef struct
 } WindowClauseSortData;
 
 /* Local functions */
+static void finalize_lockmodes(PlannedStmt *stmt);
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
 static void inheritance_planner(PlannerInfo *root);
@@ -565,9 +566,155 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 			result->jitFlags |= PGJIT_DEFORM;
 	}
 
+	/*
+	 * Determine correct lock modes for each rtable entry and the indexes
+	 * belonging to it.
+	 */
+	finalize_lockmodes(result);
+
 	return result;
 }
 
+typedef struct RelationLockmodeElem
+{
+	Oid			relid;		/* hash key -- must be first */
+	LOCKMODE	lockmode;
+} RelationLockmodeElem;
+
+
+/*
+ * finalize_lockmodes
+ *		Process stmt's rtable and determine the strongest lock level for each
+ *		relation and upgrade weaker locks to the strongest lock level for that
+ *		relation.  Also determine the lock level require for each relation's
+ *		index.
+ */
+static void
+finalize_lockmodes(PlannedStmt *stmt)
+{
+	Bitmapset  *resultRelids = NULL;
+	List	   *rtable = stmt->rtable;
+	ListCell   *lc;
+	int			relid;
+
+	/*
+	 * Determine the strongest lock level for each relation in rtable.  We
+	 * must apply the strongest lock of each relation if the same relation is
+	 * seen more than once where the lock levels vary.  This works around lock
+	 * upgrade hazards we might see if we obtained the weaker lock followed by
+	 * the stronger lock level.  We need only attempt this when there are
+	 * multiple entries in the rtable.
+	 */
+	if (list_length(rtable) > 1)
+	{
+		RelationLockmodeElem   *elem;
+		HTAB	   *htab;
+		HASHCTL		ctl;
+		Oid			relid;
+		bool		found;
+		bool		applystrongest = false;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(RelationLockmodeElem);
+		ctl.hcxt = CurrentMemoryContext;
+
+		htab = hash_create("Lockmode table", list_length(rtable),
+			&ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		foreach(lc, rtable)
+		{
+			RangeTblEntry *rte = lfirst(lc);
+
+			if (rte->relkind != RELKIND_RELATION)
+				continue;
+
+			relid = rte->relid;
+			elem = (RelationLockmodeElem *)
+				hash_search(htab, &relid, HASH_ENTER, &found);
+
+			/*
+			 * When we've seen this relation before and the lockmode varies
+			 * from the last time we saw this rel, then mark that we need to
+			 * make another pass over the list to apply the strongest of the
+			 * seen lock modes.
+			 */
+			if (found)
+			{
+				if (elem->lockmode != rte->rellockmode)
+				{
+					applystrongest = true;
+					elem->lockmode = Max(elem->lockmode, rte->rellockmode);
+				}
+			}
+			else
+				elem->lockmode = rte->rellockmode;
+		}
+
+		if (applystrongest)
+		{
+			foreach(lc, rtable)
+			{
+				RangeTblEntry *rte = lfirst(lc);
+
+				if (rte->relkind != RELKIND_RELATION)
+					continue;
+
+				relid = rte->relid;
+
+				elem = (RelationLockmodeElem *)
+					hash_search(htab, &relid, HASH_FIND, &found);
+				Assert(found);
+
+				rte->rellockmode = elem->lockmode;
+			}
+		}
+
+		hash_destroy(htab);
+	}
+
+	if (stmt->commandType == CMD_INSERT || stmt->commandType == CMD_UPDATE)
+	{
+		foreach(lc, stmt->resultRelations)
+		{
+			relid = lfirst_int(lc);
+
+			resultRelids = bms_add_member(resultRelids, relid);
+		}
+	}
+
+	/* Determine index lock mode */
+	relid = 1;
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst(lc);
+
+		if (rte->relkind != RELKIND_RELATION)
+		{
+			relid++;
+			continue;
+		}
+
+		/*
+		 * SELECT always only will require an AccessShareLock on the indexes,
+		 * DELETE is the same since we're not modifying the index, only
+		 * marking the tuple on the heap as dead.
+		 */
+		if (stmt->commandType == CMD_SELECT || stmt->commandType == CMD_DELETE)
+			rte->idxlockmode = AccessShareLock;
+
+		/*
+		 * For INSERT AND UPDATE we require a RowExclusiveLock only for result
+		 * relations. resultRelids will be NULL when not an INSERT or UPDATE.
+		 */
+		else if (bms_is_member(relid, resultRelids))
+			rte->idxlockmode = RowExclusiveLock;
+		else
+			rte->idxlockmode = AccessShareLock; /* XXX how about utility commands? */
+
+		relid++;
+	}
+}
 
 /*--------------------
  * subquery_planner
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index e5bdc1cec5..6dbda5645d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -987,6 +987,7 @@ typedef struct RangeTblEntry
 	Oid			relid;			/* OID of the relation */
 	char		relkind;		/* relation kind (see pg_class.relkind) */
 	int			rellockmode;	/* lock level that query requires on the rel */
+	int			idxlockmode;	/* lock level required for rel's indexes */
 	struct TableSampleClause *tablesample;	/* sampling info, or NULL */
 
 	/*
#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#3)
Re: Inadequate executor locking of indexes

On Fri, Nov 23, 2018 at 9:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thinking more about this, the problem I noted previously about two of
these solutions not working if the index scan node is not physically
underneath the ModifyTable node actually applies to all three :-(.
It's a slightly different issue for #2, namely that what we risk is
first taking AccessShareLock and then upgrading to RowExclusiveLock.
Since there are places (not many) that take ShareLock on indexes,
this would pose a deadlock risk.

Can you be a bit more specific on what exact deadlock risk you are
seeing here as Amit L asked about it and I am also curious to know?
One way I could see is:

Session-1
begin;
Lock table foo in Access Share Mode;

Session-2
begin;
Lock table foo in Share Mode;

Session-1
Lock table foo in Row Exclusive Mode; --here it will wait for session-2

Session-2
Lock table foo in Access Exclusive Mode; --here it will lead to deadlock

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#12David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#10)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Wed, 28 Nov 2018 at 01:55, David Rowley <david.rowley@2ndquadrant.com> wrote:

If this looks like a good path to go in, then I can produce something
a bit more finished. I'm just a bit unsure when exactly I can do that
as I'm on leave and have other commitments to take care of.

This patch is still on my list, so I had another look at what I did
back in November...

I've changed a couple of things:

1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
idxlockmode field.
2. Renamed a few variables in finalize_lockmodes().

I'm keen to get some feedback if we should go about fixing things this
way. One thing that's still on my mind is that the parser is still at
risk of lock upgrade hazards. This patch only fixes the executor. I
don't quite see how it would be possible to fix the same in the
parser.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE. I wondered if it shouldn't
somehow depend on what the idxlockmode is set to. I also saw that
apply_handle_delete() makes a call to ExecOpenIndices(). I don't think
that one is needed, but I didn't test anything to make sure. Maybe
that's for another thread anyway.

Updated patch is attached.

Adding to the March commitfest as a bug fix.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

idxlockmode_and_lock_upgrade_fix_v2.patchapplication/octet-stream; name=idxlockmode_and_lock_upgrade_fix_v2.patchDownload
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index bd837d3cd8..0d9b247bb4 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,7 +211,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -260,16 +260,9 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index b3f61dd1fc..3098c8342f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -495,7 +495,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -557,16 +557,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 324356ec75..a9d3fa6f0b 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -919,7 +919,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -982,16 +982,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b44ead269f..b90e834b2b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2354,6 +2354,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(relid);
 	COPY_SCALAR_FIELD(relkind);
 	COPY_SCALAR_FIELD(rellockmode);
+	COPY_SCALAR_FIELD(idxlockmode);
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 1e169e0b9c..275585d866 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2631,6 +2631,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_SCALAR_FIELD(relid);
 	COMPARE_SCALAR_FIELD(relkind);
 	COMPARE_SCALAR_FIELD(rellockmode);
+	COMPARE_SCALAR_FIELD(idxlockmode);
 	COMPARE_NODE_FIELD(tablesample);
 	COMPARE_NODE_FIELD(subquery);
 	COMPARE_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f97cf37f1f..d8418814ec 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3020,6 +3020,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
 			WRITE_INT_FIELD(rellockmode);
+			WRITE_INT_FIELD(idxlockmode);
 			WRITE_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3b002778ad..e4b7df0cb7 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1363,6 +1363,7 @@ _readRangeTblEntry(void)
 			READ_OID_FIELD(relid);
 			READ_CHAR_FIELD(relkind);
 			READ_INT_FIELD(rellockmode);
+			READ_INT_FIELD(idxlockmode);
 			READ_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2239728cf..41cd70ad7b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -127,6 +127,7 @@ typedef struct
 } WindowClauseSortData;
 
 /* Local functions */
+static void finalize_lockmodes(PlannedStmt *stmt);
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
 static void inheritance_planner(PlannerInfo *root);
@@ -570,9 +571,163 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 			result->jitFlags |= PGJIT_DEFORM;
 	}
 
+	/*
+	 * Determine correct lock modes for each rtable entry and the indexes
+	 * belonging to it.
+	 */
+	finalize_lockmodes(result);
+
 	return result;
 }
 
+typedef struct RelationLockmodeElem
+{
+	Oid			relid;		/* hash key -- must be first */
+	LOCKMODE	lockmode;
+} RelationLockmodeElem;
+
+
+/*
+ * finalize_lockmodes
+ *		Process stmt's rtable and determine the strongest lock level for each
+ *		distinct relation and upgrade weaker locks to the strongest lock level
+ *		for that relation.  Also determine the lock level required for each
+ *		relation's indexes and set that in the rel's idxlockmode field.
+ */
+static void
+finalize_lockmodes(PlannedStmt *stmt)
+{
+	Bitmapset  *resultRelids = NULL;
+	List	   *rtable = stmt->rtable;
+	ListCell   *lc;
+	int			relid;
+
+	/*
+	 * Determine the strongest lock level for each relation in rtable.  We
+	 * must apply the strongest lock of each relation if the same relation is
+	 * seen more than once and the lock levels vary.  This defends against
+	 * lock upgrade hazards we might see if we obtained the weaker lock
+	 * followed by the stronger lock level.  We need only attempt this when
+	 * there are multiple entries in the rtable.
+	 */
+	if (list_length(rtable) > 1)
+	{
+		RelationLockmodeElem   *elem;
+		HTAB	   *htab;
+		HASHCTL		ctl;
+		bool		found;
+		bool		applystrongest = false;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(RelationLockmodeElem);
+		ctl.hcxt = CurrentMemoryContext;
+
+		htab = hash_create("Lockmode table", list_length(rtable),
+			&ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		foreach(lc, rtable)
+		{
+			RangeTblEntry *rte = lfirst(lc);
+			Oid			reloid;
+
+			if (rte->relkind != RELKIND_RELATION)
+				continue;
+
+			reloid = rte->relid;
+			elem = (RelationLockmodeElem *)
+				hash_search(htab, &reloid, HASH_ENTER, &found);
+
+			/*
+			 * When we've seen this relation before and the lockmode varies
+			 * from the last time we saw it, then mark that we need to make
+			 * another pass over the list to apply the strongest of the seen
+			 * lock modes.
+			 */
+			if (found)
+			{
+				if (elem->lockmode != rte->rellockmode)
+				{
+					applystrongest = true;
+					elem->lockmode = Max(elem->lockmode, rte->rellockmode);
+				}
+			}
+			else
+				elem->lockmode = rte->rellockmode;
+		}
+
+		/*
+		 * If there are multiple instances of the same rel with varying lock
+		 * strengths then set the strongest lock level to each instance of
+		 * that relation.
+		 */
+		if (applystrongest)
+		{
+			foreach(lc, rtable)
+			{
+				RangeTblEntry *rte = lfirst(lc);
+				Oid			reloid;
+
+				if (rte->relkind != RELKIND_RELATION)
+					continue;
+
+				reloid = rte->relid;
+
+				elem = (RelationLockmodeElem *)
+					hash_search(htab, &reloid, HASH_FIND, &found);
+				Assert(found);
+
+				rte->rellockmode = elem->lockmode;
+			}
+		}
+
+		hash_destroy(htab);
+	}
+
+	if (stmt->commandType == CMD_INSERT || stmt->commandType == CMD_UPDATE)
+	{
+		foreach(lc, stmt->resultRelations)
+		{
+			relid = lfirst_int(lc);
+
+			resultRelids = bms_add_member(resultRelids, relid);
+		}
+	}
+
+	/* Determine index lock mode */
+	relid = 1;
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst(lc);
+
+		if (rte->relkind != RELKIND_RELATION)
+		{
+			relid++;
+			continue;
+		}
+
+		/*
+		 * SELECT always only will require an AccessShareLock on the indexes,
+		 * DELETE is the same since we're not modifying the index, only
+		 * marking the tuple on the heap as dead.
+		 */
+		if (stmt->commandType == CMD_SELECT || stmt->commandType == CMD_DELETE)
+			rte->idxlockmode = AccessShareLock;
+
+		/*
+		 * For INSERT and UPDATE we require a RowExclusiveLock only for result
+		 * relations. resultRelids will be NULL when not an INSERT or UPDATE.
+		 */
+		else if (bms_is_member(relid, resultRelids))
+			rte->idxlockmode = RowExclusiveLock;
+		else
+			rte->idxlockmode = AccessShareLock;
+
+		relid++;
+	}
+
+	bms_free(resultRelids);
+}
 
 /*--------------------
  * subquery_planner
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fe14d7db2..fceb117819 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -990,6 +990,7 @@ typedef struct RangeTblEntry
 	Oid			relid;			/* OID of the relation */
 	char		relkind;		/* relation kind (see pg_class.relkind) */
 	int			rellockmode;	/* lock level that query requires on the rel */
+	int			idxlockmode;	/* lock level required for rel's indexes */
 	struct TableSampleClause *tablesample;	/* sampling info, or NULL */
 
 	/*
#13Julien Rouhaud
rjuju123@gmail.com
In reply to: David Rowley (#12)
Re: Inadequate executor locking of indexes

Hi,

On Tue, Feb 5, 2019 at 5:16 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

I've changed a couple of things:

1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
idxlockmode field.
2. Renamed a few variables in finalize_lockmodes().

I'm keen to get some feedback if we should go about fixing things this
way. One thing that's still on my mind is that the parser is still at
risk of lock upgrade hazards. This patch only fixes the executor. I
don't quite see how it would be possible to fix the same in the
parser.

+        /*
+         * If there are multiple instances of the same rel with varying lock
+         * strengths then set the strongest lock level to each instance of
+         * that relation.
+         */
+        if (applystrongest)
[...]

The patch is quite straightforward, so I don't have general comments
on it. However, I think that the idxlockmode initialization is
problematic: you're using the statement's commandType so this doesn't
work with CTE. For instance, with this artificial query

WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;

will take an AccessShareLock on test's index while it should have an
RowExclusiveLock. I guess that you have to code similar lock upgrade
logic for the indexes, inspecting the planTree and subplans to find
the correct command type.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE. I wondered if it shouldn't
somehow depend on what the idxlockmode is set to.

I don't think that it's great either. However for DELETE we shouldn't
simply call ExecOpenIndices(), but open only the used indexes right?

#14David Rowley
david.rowley@2ndquadrant.com
In reply to: Julien Rouhaud (#13)
Re: Inadequate executor locking of indexes

On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <rjuju123@gmail.com> wrote:

The patch is quite straightforward, so I don't have general comments
on it. However, I think that the idxlockmode initialization is
problematic: you're using the statement's commandType so this doesn't
work with CTE. For instance, with this artificial query

WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;

will take an AccessShareLock on test's index while it should have an
RowExclusiveLock. I guess that you have to code similar lock upgrade
logic for the indexes, inspecting the planTree and subplans to find
the correct command type.

Good catch. I'm a bit stuck on the best way to fix this. So far I can
only think of, either;

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE. I wondered if it shouldn't
somehow depend on what the idxlockmode is set to.

I don't think that it's great either. However for DELETE we shouldn't
simply call ExecOpenIndices(), but open only the used indexes right?

No, I don't think so. The "used" index(es) will be opened in the scan
node(s). The reason I didn't like it much is that I wanted to keep
the logic for deciding what lock level to use in the planner. The
executor seems to have more knowledge than I think maybe it should.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Julien Rouhaud
rjuju123@gmail.com
In reply to: David Rowley (#14)
Re: Inadequate executor locking of indexes

On Mon, Feb 11, 2019 at 5:32 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <rjuju123@gmail.com> wrote:

The patch is quite straightforward, so I don't have general comments
on it. However, I think that the idxlockmode initialization is
problematic: you're using the statement's commandType so this doesn't
work with CTE. For instance, with this artificial query

WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;

will take an AccessShareLock on test's index while it should have an
RowExclusiveLock. I guess that you have to code similar lock upgrade
logic for the indexes, inspecting the planTree and subplans to find
the correct command type.

Good catch. I'm a bit stuck on the best way to fix this. So far I can
only think of, either;

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

But we would still need the same lock level upgrade logic on indexes
for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
relation I think. #1 seems like a better solution to me.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE. I wondered if it shouldn't
somehow depend on what the idxlockmode is set to.

I don't think that it's great either. However for DELETE we shouldn't
simply call ExecOpenIndices(), but open only the used indexes right?

No, I don't think so. The "used" index(es) will be opened in the scan
node(s). The reason I didn't like it much is that I wanted to keep
the logic for deciding what lock level to use in the planner. The
executor seems to have more knowledge than I think maybe it should.

Ah, I see. Thanks.

#16David Rowley
david.rowley@2ndquadrant.com
In reply to: Julien Rouhaud (#15)
Re: Inadequate executor locking of indexes

On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Feb 11, 2019 at 5:32 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

But we would still need the same lock level upgrade logic on indexes
for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
relation I think. #1 seems like a better solution to me.

I think I'd rather find some way to do it that didn't denormalise the
parse nodes like that. It seems very strange to have a CmdType in the
Query struct, and then another set of them in RangeTblEntry. Besides
bloating the size of the RangeTblEntry struct a bit, it also could
lead to inconsistency bugs where the two CmdTypes differ, for some
reason.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#16)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Tue, 19 Feb 2019 at 12:13, David Rowley <david.rowley@2ndquadrant.com> wrote:

On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Feb 11, 2019 at 5:32 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

But we would still need the same lock level upgrade logic on indexes
for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
relation I think. #1 seems like a better solution to me.

I think I'd rather find some way to do it that didn't denormalise the
parse nodes like that. It seems very strange to have a CmdType in the
Query struct, and then another set of them in RangeTblEntry. Besides
bloating the size of the RangeTblEntry struct a bit, it also could
lead to inconsistency bugs where the two CmdTypes differ, for some
reason.

I had another go at this patch and fixed the problem by just setting
the idxlockmode inside the planner just before the call to
expand_inherited_tables(). This allows the lockmode to be copied into
child RTEs. Doing it later in planning is more of a problem since we
don't really store all the result relations in PlannerInfo, we only
store the one that was written in the query. The others are stored in
the ModifyTable path and then later in the ModifyTable plan. The only
other place I could see to do it (without adding an extra field in
PlannerInfo) was during createplan... which does not at all seem like
the right place, and is also too late for get_relation_info(), see
below.

The finalize_lockmodes() now does the same thing for idxlockmode as it
does for rellockmode, i.e, just set the strongest lock for each unique
rel oid.

However, during my work on this, I saw a few things that made me
wonder if all this is over-engineered and worth the trouble. The
first thing I saw was that in get_relation_info() we obtain a
RowExclusiveLock if the relation is the result relation. This means we
obtain a RowExclusiveLock for DELETE targets too. This is
inconsistent with what the executor tries to do and results in us
taking a weaker lock during execution than we do during planning. It
also means we don't bump into the lock in the local lock table which
results in slower locking. That problem would be exaggerated with
partitioned tables with a large number of partitions or inheritance
parents with lots of children, however, likely that'll be drowned out
by all the other slow stuff around there.

Another thing that's on my mind is that in the patch in
nodeModifyTable.c we still do:

if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

i.e don't open the indexes for DELETEs. I had ideas that maybe this
could be changed to check the idxlockmode and open the indexes if it's
above AccessSharedLock. There didn't seem to be a very nice way to
fetch the RangeTblEntry from the ResultRelInfo though, so I didn't do
that, but leaving it as is does not really work well with the patch as
I really want the planner be the thing that decides what happens here.

My final thoughts were that I see a lot of work going on to implement
pluggable storage. I ended up having an offlist conversation with
Thomas Munro about zheap and what its requirements were for locking
indexes during deletes. It seems modifying the index during delete is
not something that happens with the current version, but is planned
for future versions. In any case, we can't really assume zheap is
going to be the only additional storage engine implementation that's
going to get hooked into this.

Current thoughts are, do we:

1. Allow the storage engine to communicate its locking needs via an
API function call and add code to check that in the
determine_index_lockmode function (new in the attached patch)
2. Just get rid of the "operation != CMD_DELETE &&" line in the above
code and just always lock indexes for DELETE targets in
RowExclusiveLock mode.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes). I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough. We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

I'm adding Andres, Robert, Thomas, Amit and Haribabu due to the
involvement with pluggable storage and zheap.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

idxlockmode_and_lock_upgrade_fix_v3.patchapplication/octet-stream; name=idxlockmode_and_lock_upgrade_fix_v3.patchDownload
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 36e3d44aad..60a205942c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -518,6 +518,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	rte->relid = intoRelationAddr.objectId;
 	rte->relkind = relkind;
 	rte->rellockmode = RowExclusiveLock;
+	rte->idxlockmode = RowExclusiveLock;
 	rte->requiredPerms = ACL_INSERT;
 
 	for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index bd837d3cd8..0d9b247bb4 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,7 +211,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -260,16 +260,9 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 2d954b722a..0c74509825 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,7 +493,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -556,16 +556,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8f39cc2b6b..0dca63ccd5 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -901,7 +901,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -964,16 +964,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->idxlockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a8a735c247..96e1ee716c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2356,6 +2356,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(relid);
 	COPY_SCALAR_FIELD(relkind);
 	COPY_SCALAR_FIELD(rellockmode);
+	COPY_SCALAR_FIELD(idxlockmode);
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3cab90e9f8..ae806fb38b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2633,6 +2633,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_SCALAR_FIELD(relid);
 	COMPARE_SCALAR_FIELD(relkind);
 	COMPARE_SCALAR_FIELD(rellockmode);
+	COMPARE_SCALAR_FIELD(idxlockmode);
 	COMPARE_NODE_FIELD(tablesample);
 	COMPARE_NODE_FIELD(subquery);
 	COMPARE_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69179a07c3..c4246695a3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3034,6 +3034,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
 			WRITE_INT_FIELD(rellockmode);
+			WRITE_INT_FIELD(idxlockmode);
 			WRITE_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4b845b1bb7..1119cd1ebb 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1365,6 +1365,7 @@ _readRangeTblEntry(void)
 			READ_OID_FIELD(relid);
 			READ_CHAR_FIELD(relkind);
 			READ_INT_FIELD(rellockmode);
+			READ_INT_FIELD(idxlockmode);
 			READ_NODE_FIELD(tablesample);
 			break;
 		case RTE_SUBQUERY:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9bb068a52e..2d4aad4628 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -128,6 +128,7 @@ typedef struct
 } WindowClauseSortData;
 
 /* Local functions */
+static void finalize_lockmodes(PlannedStmt *stmt);
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
 static void inheritance_planner(PlannerInfo *root);
@@ -137,6 +138,7 @@ static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
 static List *remap_to_groupclause_idx(List *groupClause, List *gsets,
 						 int *tleref_to_colnum_map);
 static void preprocess_rowmarks(PlannerInfo *root);
+static void determine_index_lockmode(PlannerInfo *root);
 static double preprocess_limit(PlannerInfo *root,
 				 double tuple_fraction,
 				 int64 *offset_est, int64 *count_est);
@@ -571,9 +573,132 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	if (glob->partition_directory != NULL)
 		DestroyPartitionDirectory(glob->partition_directory);
 
+	/*
+	 * Determine correct lock modes for each rtable entry and the indexes
+	 * belonging to it.
+	 */
+	finalize_lockmodes(result);
+
 	return result;
 }
 
+typedef struct RelationLockmodeElem
+{
+	Oid			relid;		/* hash key -- must be first */
+	LOCKMODE	rellockmode;
+	LOCKMODE	idxlockmode;
+} RelationLockmodeElem;
+
+
+/*
+ * finalize_lockmodes
+ *		Process stmt's rtable and determine the strongest lock level for each
+ *		distinct relation and upgrade weaker locks to the strongest lock level
+ *		for that relation.  Also determine the lock level required for each
+ *		relation's indexes and set that in the rel's idxlockmode field.
+ */
+static void
+finalize_lockmodes(PlannedStmt *stmt)
+{
+	List	   *rtable = stmt->rtable;
+	ListCell   *lc;
+	RelationLockmodeElem   *elem;
+	HTAB	   *htab;
+	HASHCTL		ctl;
+	bool		found;
+	bool		applystrongest;
+
+	/*
+	 * There can't be any duplicate references to the same relation when the
+	 * rtable has a single entry.
+	 */
+	if (list_length(rtable) < 2)
+		return;
+
+	applystrongest = false;
+
+	/*
+	 * Determine the strongest lock level for each relation in rtable.  We
+	 * must apply the strongest lock of each relation if the same relation is
+	 * seen more than once and the lock levels vary.  This defends against
+	 * lock upgrade hazards we might see if we obtained the weaker lock
+	 * followed by the stronger lock level.  We need only attempt this when
+	 * there are multiple entries in the rtable.
+	 */
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(RelationLockmodeElem);
+	ctl.hcxt = CurrentMemoryContext;
+
+	htab = hash_create("Lockmode table", list_length(rtable),
+					   &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	foreach(lc, rtable)
+	{
+		RangeTblEntry *rte = lfirst(lc);
+		Oid			reloid;
+
+		if (rte->relkind != RELKIND_RELATION)
+			continue;
+
+		reloid = rte->relid;
+		elem = (RelationLockmodeElem *)
+			hash_search(htab, &reloid, HASH_ENTER, &found);
+
+		/*
+		 * When we've seen this relation before and the lockmode varies from
+		 * the last time we saw it, then mark that we need to make another
+		 * pass over the list to apply the strongest of the seen lock modes.
+		 */
+		if (found)
+		{
+			if (elem->rellockmode != rte->rellockmode)
+			{
+				applystrongest = true;
+				elem->rellockmode = Max(elem->rellockmode, rte->rellockmode);
+			}
+
+			if (elem->idxlockmode != rte->idxlockmode)
+			{
+				applystrongest = true;
+				elem->idxlockmode = Max(elem->idxlockmode, rte->rellockmode);
+			}
+		}
+		else
+		{
+			elem->rellockmode = rte->rellockmode;
+			elem->idxlockmode = rte->idxlockmode;
+		}
+	}
+
+	/*
+	 * If there are multiple instances of the same rel with varying lock
+	 * strengths then set the strongest lock level to each instance of that
+	 * relation.
+	 */
+	if (applystrongest)
+	{
+		foreach(lc, rtable)
+		{
+			RangeTblEntry *rte = lfirst(lc);
+			Oid			reloid;
+
+			if (rte->relkind != RELKIND_RELATION)
+				continue;
+
+			reloid = rte->relid;
+
+			elem = (RelationLockmodeElem *)
+				hash_search(htab, &reloid, HASH_FIND, &found);
+			Assert(found);
+
+			rte->rellockmode = elem->rellockmode;
+			rte->idxlockmode = elem->idxlockmode;
+		}
+	}
+
+	hash_destroy(htab);
+}
 
 /*--------------------
  * subquery_planner
@@ -728,6 +853,13 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	 */
 	preprocess_rowmarks(root);
 
+	/*
+	 * Determine the lock level required for indexes for each RTE_RELATION.
+	 * We must do this before calling expand_inherited_tables as we want
+	 * child tables to inherit the same lock level as their parent.
+	 */
+	determine_index_lockmode(root);
+
 	/*
 	 * Expand any rangetable entries that are inheritance sets into "append
 	 * relations".  This can add entries to the rangetable, but they must be
@@ -2675,6 +2807,37 @@ select_rowmark_type(RangeTblEntry *rte, LockClauseStrength strength)
 	}
 }
 
+/*
+ * determine_index_lockmode
+ *		Determine and set the idxlockmode for each entry in the rtable.
+ */
+static void
+determine_index_lockmode(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	ListCell   *lc;
+	Index		rti;
+
+	rti = 1;
+	foreach(lc, parse->rtable)
+	{
+		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+
+		/*
+		 * Indexes of the target of an INSERT and UPDATE require a
+		 * RowExclusiveLock.  DELETE can use AccessShareLock since we don't
+		 * modify index entries on delete.  SELECT always uses
+		 * AccessShareLock.
+		 */
+		if (rti == parse->resultRelation && parse->commandType != CMD_DELETE)
+			rte->idxlockmode = RowExclusiveLock;
+		else
+			rte->idxlockmode = AccessShareLock;
+
+		rti++;
+	}
+}
+
 /*
  * preprocess_limit - do pre-estimation for LIMIT and/or OFFSET clauses
  *
@@ -6039,6 +6202,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 	rte->relid = tableOid;
 	rte->relkind = RELKIND_RELATION;	/* Don't be too picky. */
 	rte->rellockmode = AccessShareLock;
+	rte->idxlockmode = AccessShareLock;
 	rte->lateral = false;
 	rte->inh = false;
 	rte->inFromCl = true;
@@ -6162,6 +6326,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	rte->relid = tableOid;
 	rte->relkind = RELKIND_RELATION;	/* Don't be too picky. */
 	rte->rellockmode = AccessShareLock;
+	rte->idxlockmode = AccessShareLock;
 	rte->lateral = false;
 	rte->inh = true;
 	rte->inFromCl = true;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30f4dc151b..65d4b0d742 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -110,6 +110,7 @@ void
 get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				  RelOptInfo *rel)
 {
+	RangeTblEntry *rte = root->simple_rte_array[rel->relid];
 	Index		varno = rel->relid;
 	Relation	relation;
 	bool		hasindex;
@@ -164,23 +165,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	{
 		List	   *indexoidlist;
 		ListCell   *l;
-		LOCKMODE	lmode;
 
 		indexoidlist = RelationGetIndexList(relation);
 
-		/*
-		 * For each index, we get the same type of lock that the executor will
-		 * need, and do not release it.  This saves a couple of trips to the
-		 * shared lock manager while not creating any real loss of
-		 * concurrency, because no schema changes could be happening on the
-		 * index while we hold lock on the parent rel, and neither lock type
-		 * blocks any other kind of index operation.
-		 */
-		if (rel->relid == root->parse->resultRelation)
-			lmode = RowExclusiveLock;
-		else
-			lmode = AccessShareLock;
-
 		foreach(l, indexoidlist)
 		{
 			Oid			indexoid = lfirst_oid(l);
@@ -195,7 +182,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			/*
 			 * Extract info from the relation descriptor for the index.
 			 */
-			indexRelation = index_open(indexoid, lmode);
+			indexRelation = index_open(indexoid, rte->idxlockmode);
 			index = indexRelation->rd_index;
 
 			/*
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index f3b6d193aa..6601387bb7 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1227,6 +1227,7 @@ addRangeTableEntry(ParseState *pstate,
 	rte->relid = RelationGetRelid(rel);
 	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = lockmode;
+	rte->idxlockmode = 0;			/* Set during planning */
 
 	/*
 	 * Build the list of effective column names using user-supplied aliases
@@ -1305,6 +1306,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
 	rte->relid = RelationGetRelid(rel);
 	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = lockmode;
+	rte->idxlockmode = 0;			/* Set during planning */
 
 	/*
 	 * Build the list of effective column names using user-supplied aliases
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 07f4ec9055..42964eb31a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -186,6 +186,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	rte->relid = RelationGetRelid(rel->localrel);
 	rte->relkind = rel->localrel->rd_rel->relkind;
 	rte->rellockmode = AccessShareLock;
+	rte->idxlockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 4fc50c89b9..ae4b75ad24 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1706,6 +1706,7 @@ ApplyRetrieveRule(Query *parsetree,
 	rte->relid = InvalidOid;
 	rte->relkind = 0;
 	rte->rellockmode = 0;
+	rte->idxlockmode = 0;
 	rte->tablesample = NULL;
 	rte->inh = false;			/* must not be set for a subquery */
 
@@ -3040,6 +3041,7 @@ rewriteTargetView(Query *parsetree, Relation view)
 	 */
 	new_rte = base_rte;
 	new_rte->rellockmode = RowExclusiveLock;
+	new_rte->idxlockmode = RowExclusiveLock;
 
 	parsetree->rtable = lappend(parsetree->rtable, new_rte);
 	new_rt_index = list_length(parsetree->rtable);
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d715709b7c..d58ebddd9b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1295,6 +1295,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	pkrte->relid = RelationGetRelid(pk_rel);
 	pkrte->relkind = pk_rel->rd_rel->relkind;
 	pkrte->rellockmode = AccessShareLock;
+	pkrte->idxlockmode = AccessShareLock;
 	pkrte->requiredPerms = ACL_SELECT;
 
 	fkrte = makeNode(RangeTblEntry);
@@ -1302,6 +1303,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	fkrte->relid = RelationGetRelid(fk_rel);
 	fkrte->relkind = fk_rel->rd_rel->relkind;
 	fkrte->rellockmode = AccessShareLock;
+	fkrte->idxlockmode = AccessShareLock;
 	fkrte->requiredPerms = ACL_SELECT;
 
 	for (int i = 0; i < riinfo->nkeys; i++)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 85055bbb95..3197b04298 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1004,6 +1004,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
 		oldrte->relid = trigrec->tgrelid;
 		oldrte->relkind = relkind;
 		oldrte->rellockmode = AccessShareLock;
+		oldrte->idxlockmode = AccessShareLock;
 		oldrte->alias = makeAlias("old", NIL);
 		oldrte->eref = oldrte->alias;
 		oldrte->lateral = false;
@@ -1015,6 +1016,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
 		newrte->relid = trigrec->tgrelid;
 		newrte->relkind = relkind;
 		newrte->rellockmode = AccessShareLock;
+		newrte->idxlockmode = AccessShareLock;
 		newrte->alias = makeAlias("new", NIL);
 		newrte->eref = newrte->alias;
 		newrte->lateral = false;
@@ -3226,6 +3228,7 @@ deparse_context_for(const char *aliasname, Oid relid)
 	rte->relid = relid;
 	rte->relkind = RELKIND_RELATION;	/* no need for exactness here */
 	rte->rellockmode = AccessShareLock;
+	rte->idxlockmode = AccessShareLock;
 	rte->alias = makeAlias(aliasname, NIL);
 	rte->eref = rte->alias;
 	rte->lateral = false;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fe35783359..8197a76be6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -990,6 +990,7 @@ typedef struct RangeTblEntry
 	Oid			relid;			/* OID of the relation */
 	char		relkind;		/* relation kind (see pg_class.relkind) */
 	int			rellockmode;	/* lock level that query requires on the rel */
+	int			idxlockmode;	/* lock level required for rel's indexes */
 	struct TableSampleClause *tablesample;	/* sampling info, or NULL */
 
 	/*
#18Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#17)
Re: Inadequate executor locking of indexes

(this is not a reply to your full proposal, just something I thought to
point out)

On 2019/03/13 10:38, David Rowley wrote:

i.e don't open the indexes for DELETEs. I had ideas that maybe this
could be changed to check the idxlockmode and open the indexes if it's
above AccessSharedLock. There didn't seem to be a very nice way to
fetch the RangeTblEntry from the ResultRelInfo though,

Did you miss ri_RangeTableIndex? It's the range table index of the result
relation for which a given ResultRelInfo is created.

Thanks,
Amit

#19David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#18)
Re: Inadequate executor locking of indexes

On Wed, 13 Mar 2019 at 14:55, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Did you miss ri_RangeTableIndex? It's the range table index of the result
relation for which a given ResultRelInfo is created.

I did indeed. I'll hold off modifying the patch in favour of seeing
what other people think about what should be done here.

Thanks for pointing out ri_RangeTableIndex.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#17)
Re: Inadequate executor locking of indexes

Hi David,

On 2019/03/13 10:38, David Rowley wrote:

I had another go at this patch and fixed the problem by just setting
the idxlockmode inside the planner just before the call to
expand_inherited_tables(). This allows the lockmode to be copied into
child RTEs.

I have one question about the relation between idxlockmode and
rellockmode? From skimming the patch, it appears that they're almost
always set to the same value. If so, why not use rellockmode for index
locking too?

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes). I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough. We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

Tom gave an example upthread that looked like this:

explain (costs off) with x1 as materialized (select * from foo where a =
1), x2 as (update foo set a = a where a = 1 returning *) select * from x1,
x2 where x1.a = x2.a;
QUERY PLAN
────────────────────────────────────────────────────
Hash Join
Hash Cond: (x1.a = x2.a)
CTE x1
-> Bitmap Heap Scan on foo
Recheck Cond: (a = 1)
-> Bitmap Index Scan on foo_a_idx
Index Cond: (a = 1)
CTE x2
-> Update on foo foo_1
-> Bitmap Heap Scan on foo foo_1
Recheck Cond: (a = 1)
-> Bitmap Index Scan on foo_a_idx
Index Cond: (a = 1)
-> CTE Scan on x1
-> Hash
-> CTE Scan on x2
(16 rows)

When InitPlan() invokes ExecInitNode on the subplans of x1 and x2, it does
so in that order. So, ExecInitBitmapIndexScan for x1 is called before
ExecInitModifyTable for x2.

But if finalize_lockmodes() in your patch set lockmodes correctly,
ExecInitBitmapIndexScan() called for x1 ought to lock the index with
RowExclusiveLock, no?

Thanks,
Amit

#21David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#20)
Re: Inadequate executor locking of indexes

Thanks for having a look at this.

On Wed, 13 Mar 2019 at 22:45, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I have one question about the relation between idxlockmode and
rellockmode? From skimming the patch, it appears that they're almost
always set to the same value. If so, why not use rellockmode for index
locking too?

Maybe that's possible, but it would mean giving up on locking indexes
during DELETE with AccessShareLock. That would become
RowExclusiveLock. Also FOR UPDATE would lock indexes with RowShareLock
instead of AccessShareLock.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes). I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough. We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

Tom gave an example upthread that looked like this:

[...]

But if finalize_lockmodes() in your patch set lockmodes correctly,
ExecInitBitmapIndexScan() called for x1 ought to lock the index with
RowExclusiveLock, no?

Yeah, in the executor the bitmap index scan uses a RowExclusiveLock.
There's still the issue in the planner that get_relation_info() will
still take an AccessShareLock when planning the 1st CTE and then
upgrade it to RowExclusiveLock once it gets to the 2nd. It's not
really very obvious how that can be fixed, but at least we don't start
using indexes without any locks.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#22Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#21)
Re: Inadequate executor locking of indexes

On 2019/03/14 7:12, David Rowley wrote:

Thanks for having a look at this.

On Wed, 13 Mar 2019 at 22:45, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I have one question about the relation between idxlockmode and
rellockmode? From skimming the patch, it appears that they're almost
always set to the same value. If so, why not use rellockmode for index
locking too?

Maybe that's possible, but it would mean giving up on locking indexes
during DELETE with AccessShareLock. That would become
RowExclusiveLock. Also FOR UPDATE would lock indexes with RowShareLock
instead of AccessShareLock.

If the correct lock is taken in both cases by the current code, then maybe
there's no need to change anything? What does idxlockmode improve about
the existing situation? One thing I can imagine it improves is that we
don't need the potentially expensive ExecRelationIsTargetRelation() check
anymore, because idxlockmode gives that information for free.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes). I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough. We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

Tom gave an example upthread that looked like this:

[...]

But if finalize_lockmodes() in your patch set lockmodes correctly,
ExecInitBitmapIndexScan() called for x1 ought to lock the index with
RowExclusiveLock, no?

Yeah, in the executor the bitmap index scan uses a RowExclusiveLock.
There's still the issue in the planner that get_relation_info() will
still take an AccessShareLock when planning the 1st CTE and then
upgrade it to RowExclusiveLock once it gets to the 2nd. It's not
really very obvious how that can be fixed, but at least we don't start
using indexes without any locks.

If we're to fix that upgrade hazard, maybe we'll need a separate phase to
determine the strongest lock to take on each table referenced in different
parts of the query (across CTEs) that runs before
pg_analyze_and_rewrite()? We surely can't use table OIDs as keys though.

Thanks,
Amit

#23David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#22)
Re: Inadequate executor locking of indexes

On Thu, 14 Mar 2019 at 21:52, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

If the correct lock is taken in both cases by the current code, then maybe
there's no need to change anything? What does idxlockmode improve about
the existing situation? One thing I can imagine it improves is that we
don't need the potentially expensive ExecRelationIsTargetRelation() check
anymore, because idxlockmode gives that information for free.

I'm aiming to fix the problem described by Tom when he started this
thread. It's pretty easy to get an assert failure in master.

create table t1(a int);
create index on t1(a);
prepare q1 as delete from t1 where a = 1;
execute q1;
execute q1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

this fails because modifytable does not open the indexes for DELETE
and the index scan node thinks it can get away with NoLock on the
index since it thinks modifytable already locked it. We hit the
CheckRelationLockedByMe() Assert in relation_open().

The idea with the patch is to a) fix this; and b) try and reduce the
chances of bugs in this area in the future by having something more
central decide the lock level we need rather than having various bits
of code that currently disagree with each other about what needs to be
done; and c) attempt to fix the lock upgrade hazard as best as we can.

Maybe you're right about being able to use rellockmode for indexes,
but I assume that we lowered the lock level for indexes for some
reason, and this would reverse that.

If we're to fix that upgrade hazard, maybe we'll need a separate phase to
determine the strongest lock to take on each table referenced in different
parts of the query (across CTEs) that runs before
pg_analyze_and_rewrite()? We surely can't use table OIDs as keys though.

I don't think that's workable as we've yet to perform
expand_inherited_tables() and some other subquery might reference some
partition directly.

I think this borderline impossible to fix completely. To do it just
for the planner we'll need to plan each subquery as far as determining
all relations that will be used. Probably at least as far as
expand_inherited_tables(), but not as far as calling
get_relation_info() on any relation. We'd then do the same for all the
other subqueries then do finalize_lockmodes(), only then could we call
get_relation_info(), otherwise, we might call it with too weak a lock
and later we might need to obtain a stronger lock. But even then it's
possible we get lock upgrades in the parser too, for example, select *
from t1 a, t1 b for update of b;

Maybe if we can't fix that hazard completely, then it's not worth
adding any code for at all, but we still need something for a) and
fixing b) seems like a good idea too. If we don't care enough for c)
then we can use the patch without finalize_lockmodes().

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#23)
Re: Inadequate executor locking of indexes

David Rowley <david.rowley@2ndquadrant.com> writes:

On Thu, 14 Mar 2019 at 21:52, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

If the correct lock is taken in both cases by the current code, then maybe
there's no need to change anything?

I'm aiming to fix the problem described by Tom when he started this
thread. It's pretty easy to get an assert failure in master.

Yeah, removing the Assert failure is a minimum requirement. Whether
we need to do more than that can be debated.

I think this borderline impossible to fix completely.

After further review I concur with that position. The cases where
we have lock upgrade hazards are where some subquery uses a table
in a way that requires a stronger lock than is needed for some other
reference that was processed earlier. It's kind of pointless to
guarantee that we avoid that in the planner or executor if the parser
already hit the problem; and it seems darn near impossible, and
certainly impractical, to avoid the problem while parsing.

(Actually, even if we fixed the parser, I'm not sure we'd be out of
the woods. Consider a case where a subquery requires AccessShareLock
on some table, but then when we come to expand inheritance in the
planner, we discover that that same table is a child or partition of
some target table, so now we need a stronger lock on it.)

I think therefore that we should forget about the idea of avoiding
lock upgrade hazards, at least for situations where the hazard is
caused by conflicting table references that are all written by the
user. That's not common, and since none of these lock types are
exclusive, the odds of an actual deadlock are low anyway.

Maybe you're right about being able to use rellockmode for indexes,
but I assume that we lowered the lock level for indexes for some
reason, and this would reverse that.

I kind of think that we *should* use rellockmode for indexes too.
To the extent that we're doing differently from that now, I think
it's accidental not intentional. It would perhaps have been difficult
to clean that up completely before we added rellockmode, but now that
we've got that we should use it. I feel that adding a second field
for index lock mode would just be perpetuating some accidental
inconsistencies.

In short, I think we should take the parts of this patch that modify
the index_open calls, but make them use rte->rellockmode; and forget
all the other parts.

BTW, I'd also suggest expanding the header comment for
ExecRelationIsTargetRelation to explain that it's no longer
used in the core code, but we keep it around because FDWs
might want it.

regards, tom lane

#25David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#24)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Wed, 3 Apr 2019 at 06:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On Thu, 14 Mar 2019 at 21:52, Amit Langote
Maybe you're right about being able to use rellockmode for indexes,
but I assume that we lowered the lock level for indexes for some
reason, and this would reverse that.

I kind of think that we *should* use rellockmode for indexes too.
To the extent that we're doing differently from that now, I think
it's accidental not intentional. It would perhaps have been difficult
to clean that up completely before we added rellockmode, but now that
we've got that we should use it. I feel that adding a second field
for index lock mode would just be perpetuating some accidental
inconsistencies.

Okay. That certainly makes the patch a good bit more simple.

In short, I think we should take the parts of this patch that modify
the index_open calls, but make them use rte->rellockmode; and forget
all the other parts.

Done.

BTW, I'd also suggest expanding the header comment for
ExecRelationIsTargetRelation to explain that it's no longer
used in the core code, but we keep it around because FDWs
might want it.

Also done.

I also looked over other index_open() calls in the planner and found a
bunch of places in selfuncs.c that we open an index to grab some
information then close it again releasing the lock. At this stage
get_relation_info() should have already grabbed what it needs and
stored it into an IndexOptInfo, so we might have no need to access the
index again. However, if any code was added that happened to assume
the index was already locked then we'd get the same Assert failure
that we're fixing here. I've ended up changing these calls so that
they also use rellockmode, which may make the lock just a trip to the
local lock table for relations that have rellockmode >
AccessShareLock. I also changed the index_close to use NoLock so we
hold the lock.

I scanned around other usages of index_open() and saw that
gin_clean_pending_list() uses an AccessShareLock. That seems strange
since it modifies the index.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

use_rellockmode_for_indexes_too.patchapplication/octet-stream; name=use_rellockmode_for_indexes_too.patchDownload
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3b23de9fac..0deaf8f1b7 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -664,6 +664,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
  *
  *		Detect whether a relation (identified by rangetable index)
  *		is one of the target relations of the query.
+ *
+ * Note: This is currently no longer used in core.  We keep it
+ * around to allow FDWs to use it in order to determine if the foreign
+ * table is the target of an UPDATE/DELETE.
  * ----------------------------------------------------------------
  */
 bool
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index bd837d3cd8..604f4f1132 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,7 +211,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -260,16 +260,9 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 2d954b722a..7711728495 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,7 +493,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -556,16 +556,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8f39cc2b6b..399ac0109c 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -901,7 +901,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -964,16 +964,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c430189572..15fe132ead 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6285,6 +6285,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	/* Build RelOptInfo */
 	rel = build_simple_rel(root, 1, NULL);
 
+	/* Already locked by the caller */
 	heap = table_open(tableOid, NoLock);
 	index = index_open(indexOid, NoLock);
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 31a3784536..e134dd0d06 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -161,25 +161,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	if (hasindex)
 	{
+		RangeTblEntry *rte;
 		List	   *indexoidlist;
 		ListCell   *l;
-		LOCKMODE	lmode;
 
+		rte = root->simple_rte_array[varno];
 		indexoidlist = RelationGetIndexList(relation);
 
-		/*
-		 * For each index, we get the same type of lock that the executor will
-		 * need, and do not release it.  This saves a couple of trips to the
-		 * shared lock manager while not creating any real loss of
-		 * concurrency, because no schema changes could be happening on the
-		 * index while we hold lock on the parent rel, and neither lock type
-		 * blocks any other kind of index operation.
-		 */
-		if (rel->relid == root->parse->resultRelation)
-			lmode = RowExclusiveLock;
-		else
-			lmode = AccessShareLock;
-
 		foreach(l, indexoidlist)
 		{
 			Oid			indexoid = lfirst_oid(l);
@@ -194,7 +182,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			/*
 			 * Extract info from the relation descriptor for the index.
 			 */
-			indexRelation = index_open(indexoid, lmode);
+			indexRelation = index_open(indexoid, rte->rellockmode);
 			index = indexRelation->rd_index;
 
 			/*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ecffffbd0c..39e1eca47c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			 * necessarily on the index.
 			 */
 			heapRel = table_open(rte->relid, NoLock);
-			indexRel = index_open(index->indexoid, AccessShareLock);
+
+			/*
+			 * We use the same lock level as the relation as it may have
+			 * already been locked with that level.  Using the same lock level
+			 * can save a trip to the shared lock manager.
+			 */
+			Assert(rte->rellockmode != NoLock);
+			indexRel = index_open(index->indexoid, rte->rellockmode);
 
 			/* extract index key information from the index's pg_index info */
 			indexInfo = BuildIndexInfo(indexRel);
@@ -5305,7 +5312,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			/* Clean everything up */
 			ExecDropSingleTupleTableSlot(slot);
 
-			index_close(indexRel, AccessShareLock);
+			index_close(indexRel, NoLock);
 			table_close(heapRel, NoLock);
 
 			MemoryContextSwitchTo(oldcontext);
@@ -6444,6 +6451,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 				double *indexPages)
 {
 	IndexOptInfo *index = path->indexinfo;
+	RangeTblEntry *rte = root->simple_rte_array[path->path.parent->relid];
 	List	   *indexQuals = get_quals_from_indexclauses(path->indexclauses);
 	List	   *selectivityQuals;
 	double		numPages = index->pages,
@@ -6472,9 +6480,15 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 */
 	if (!index->hypothetical)
 	{
-		indexRel = index_open(index->indexoid, AccessShareLock);
+		/*
+		 * We use the same lock level as the relation as it may have already
+		 * been locked with that level.  Using the same lock level can save a
+		 * trip to the shared lock manager.
+		 */
+		Assert(rte->rellockmode != NoLock);
+		indexRel = index_open(index->indexoid, rte->rellockmode);
 		ginGetStats(indexRel, &ginStats);
-		index_close(indexRel, AccessShareLock);
+		index_close(indexRel, NoLock);
 	}
 	else
 	{
@@ -6781,11 +6795,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 							  &spc_seq_page_cost);
 
 	/*
-	 * Obtain some data from the index itself.
+	 * Obtain some data from the index itself.  We use the same lock level as
+	 * the relation as it may have already been locked with that level.  Using
+	 * the same lock level can save a trip to the shared lock manager.
 	 */
-	indexRel = index_open(index->indexoid, AccessShareLock);
+	Assert(rte->rellockmode != NoLock);
+	indexRel = index_open(index->indexoid, rte->rellockmode);
 	brinGetStats(indexRel, &statsData);
-	index_close(indexRel, AccessShareLock);
+	index_close(indexRel, NoLock);
 
 	/*
 	 * Compute index correlation
#26Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#25)
Re: Inadequate executor locking of indexes

Hi David,

Thanks for updating the patch.

On 2019/04/04 9:14, David Rowley wrote:

I also looked over other index_open() calls in the planner and found a
bunch of places in selfuncs.c that we open an index to grab some
information then close it again releasing the lock. At this stage
get_relation_info() should have already grabbed what it needs and
stored it into an IndexOptInfo, so we might have no need to access the
index again. However, if any code was added that happened to assume
the index was already locked then we'd get the same Assert failure
that we're fixing here. I've ended up changing these calls so that
they also use rellockmode, which may make the lock just a trip to the
local lock table for relations that have rellockmode >
AccessShareLock. I also changed the index_close to use NoLock so we
hold the lock.

Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
index_open, for example, here:

@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
VariableStatData *vardata,
 			 * necessarily on the index.
 			 */
 			heapRel = table_open(rte->relid, NoLock);
-			indexRel = index_open(index->indexoid, AccessShareLock);
+
+			/*
+			 * We use the same lock level as the relation as it may have
+			 * already been locked with that level.  Using the same lock level
+			 * can save a trip to the shared lock manager.
+			 */
+			Assert(rte->rellockmode != NoLock);
+			indexRel = index_open(index->indexoid, rte->rellockmode);

Especially seeing that the table itself is opened without lock. If there
are any Assert failures, wouldn't that need to be fixed in the upstream
code (such as get_relation_info)?

Also, I noticed that there is infer_arbiter_indexes() too, which opens the
target table's indexes with RowExclusiveLock. I thought for a second
that's a index-locking site in the planner that you may have missed, but
turns out it might very well be the first time those indexes are locked in
a given insert query's processing, because query_planner doesn't need to
plan access to the result relation, so get_relation_info is not called.

I scanned around other usages of index_open() and saw that
gin_clean_pending_list() uses an AccessShareLock. That seems strange
since it modifies the index.

Yeah, other maintenance tasks modifying an index, such as
brin_summarize_range(), take ShareUpdateExclusiveLock.

Thanks,
Amit

#27David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#26)
Re: Inadequate executor locking of indexes

On Thu, 4 Apr 2019 at 15:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
index_open, for example, here:

@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
VariableStatData *vardata,
* necessarily on the index.
*/
heapRel = table_open(rte->relid, NoLock);
-                       indexRel = index_open(index->indexoid, AccessShareLock);
+
+                       /*
+                        * We use the same lock level as the relation as it may have
+                        * already been locked with that level.  Using the same lock level
+                        * can save a trip to the shared lock manager.
+                        */
+                       Assert(rte->rellockmode != NoLock);
+                       indexRel = index_open(index->indexoid, rte->rellockmode);

Especially seeing that the table itself is opened without lock. If there
are any Assert failures, wouldn't that need to be fixed in the upstream
code (such as get_relation_info)?

I put that Assert there to ensure that everything that calls it has
properly set RangeTblEntry's rellockmode. If there's some valid reason
for that to be NoLock then it's the wrong thing to do, but I can't
think of a valid reason.

Also, I noticed that there is infer_arbiter_indexes() too, which opens the
target table's indexes with RowExclusiveLock. I thought for a second
that's a index-locking site in the planner that you may have missed, but
turns out it might very well be the first time those indexes are locked in
a given insert query's processing, because query_planner doesn't need to
plan access to the result relation, so get_relation_info is not called.

I skimmed over that and thought that the rellockmode would always be
RowExclusiveLock anyway, so didn't change it. However, it would make
sense to use rellockmode for consistency. We're already looking up the
RangeTblEntry to get the relid, so getting the rellockmode is about
free.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#27)
Re: Inadequate executor locking of indexes

On 2019/04/04 11:13, David Rowley wrote:

On Thu, 4 Apr 2019 at 15:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
index_open, for example, here:

@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
VariableStatData *vardata,
* necessarily on the index.
*/
heapRel = table_open(rte->relid, NoLock);
-                       indexRel = index_open(index->indexoid, AccessShareLock);
+
+                       /*
+                        * We use the same lock level as the relation as it may have
+                        * already been locked with that level.  Using the same lock level
+                        * can save a trip to the shared lock manager.
+                        */
+                       Assert(rte->rellockmode != NoLock);
+                       indexRel = index_open(index->indexoid, rte->rellockmode);

Especially seeing that the table itself is opened without lock. If there
are any Assert failures, wouldn't that need to be fixed in the upstream
code (such as get_relation_info)?

I put that Assert there to ensure that everything that calls it has
properly set RangeTblEntry's rellockmode. If there's some valid reason
for that to be NoLock then it's the wrong thing to do, but I can't
think of a valid reason.

Which means the index must have been locked with that mode somewhere
upstream, so there's no need to lock it again. Why not save the local
hash table lookup too if we can?

BTW, get_actual_variable_range is static to selfuncs.c and other public
functions that are entry point to get_actual_variable_range's
functionality appear to require having built a RelOptInfo first, which
means (even for third party code) having gone through get_relation_info
and opened the indexes. That may be too much wishful thinking though.

Also, I noticed that there is infer_arbiter_indexes() too, which opens the
target table's indexes with RowExclusiveLock. I thought for a second
that's a index-locking site in the planner that you may have missed, but
turns out it might very well be the first time those indexes are locked in
a given insert query's processing, because query_planner doesn't need to
plan access to the result relation, so get_relation_info is not called.

I skimmed over that and thought that the rellockmode would always be
RowExclusiveLock anyway, so didn't change it. However, it would make
sense to use rellockmode for consistency. We're already looking up the
RangeTblEntry to get the relid, so getting the rellockmode is about
free.

Yeah, maybe it'd be a good idea, if only for consistency, to fetch the
rellockmode of the resultRelation RTE and use it for index_open.

Thanks,
Amit

#29David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#28)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Thu, 4 Apr 2019 at 15:35, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

BTW, get_actual_variable_range is static to selfuncs.c and other public
functions that are entry point to get_actual_variable_range's
functionality appear to require having built a RelOptInfo first, which
means (even for third party code) having gone through get_relation_info
and opened the indexes. That may be too much wishful thinking though.

I think we should do this. We have the CheckRelationLockedByMe Asserts
to alert us if this ever gets broken. I think even if something added
a get_relation_info_hook to alter the indexes somehow, say to remove
one then it should still be okay as get_actual_variable_range() only
looks at index that are in that list and the other two functions are
dealing with Paths, which couldn't exist for an index that was removed
from RelOptInfo->indexlist.

Also, I noticed that there is infer_arbiter_indexes() too, which opens the
target table's indexes with RowExclusiveLock. I thought for a second
that's a index-locking site in the planner that you may have missed, but
turns out it might very well be the first time those indexes are locked in
a given insert query's processing, because query_planner doesn't need to
plan access to the result relation, so get_relation_info is not called.

I skimmed over that and thought that the rellockmode would always be
RowExclusiveLock anyway, so didn't change it. However, it would make
sense to use rellockmode for consistency. We're already looking up the
RangeTblEntry to get the relid, so getting the rellockmode is about
free.

Yeah, maybe it'd be a good idea, if only for consistency, to fetch the
rellockmode of the resultRelation RTE and use it for index_open.

I've changed infer_arbiter_indexes() too, but decided to use
rellockmode rather than NoLock since we're not using
RelOptInfo->indexlist. If anything uses get_relation_info_hook to
remove indexes and then closes removed ones releasing the lock, then
we could end up with problems here.

v2 attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

use_rellockmode_for_indexes_too_v2.patchapplication/octet-stream; name=use_rellockmode_for_indexes_too_v2.patchDownload
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3b23de9..0deaf8f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -664,6 +664,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
  *
  *		Detect whether a relation (identified by rangetable index)
  *		is one of the target relations of the query.
+ *
+ * Note: This is currently no longer used in core.  We keep it
+ * around to allow FDWs to use it in order to determine if the foreign
+ * table is the target of an UPDATE/DELETE.
  * ----------------------------------------------------------------
  */
 bool
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index bd837d3..604f4f1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,7 +211,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -260,16 +260,9 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 2d954b7..7711728 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,7 +493,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -556,16 +556,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8f39cc2..399ac01 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -901,7 +901,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -964,16 +964,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c430189..15fe132 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6285,6 +6285,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	/* Build RelOptInfo */
 	rel = build_simple_rel(root, 1, NULL);
 
+	/* Already locked by the caller */
 	heap = table_open(tableOid, NoLock);
 	index = index_open(indexOid, NoLock);
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 31a3784..c01dbb8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -161,25 +161,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	if (hasindex)
 	{
+		RangeTblEntry *rte;
 		List	   *indexoidlist;
 		ListCell   *l;
-		LOCKMODE	lmode;
 
+		rte = root->simple_rte_array[varno];
 		indexoidlist = RelationGetIndexList(relation);
 
-		/*
-		 * For each index, we get the same type of lock that the executor will
-		 * need, and do not release it.  This saves a couple of trips to the
-		 * shared lock manager while not creating any real loss of
-		 * concurrency, because no schema changes could be happening on the
-		 * index while we hold lock on the parent rel, and neither lock type
-		 * blocks any other kind of index operation.
-		 */
-		if (rel->relid == root->parse->resultRelation)
-			lmode = RowExclusiveLock;
-		else
-			lmode = AccessShareLock;
-
 		foreach(l, indexoidlist)
 		{
 			Oid			indexoid = lfirst_oid(l);
@@ -194,7 +182,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			/*
 			 * Extract info from the relation descriptor for the index.
 			 */
-			indexRelation = index_open(indexoid, lmode);
+			indexRelation = index_open(indexoid, rte->rellockmode);
 			index = indexRelation->rd_index;
 
 			/*
@@ -592,8 +580,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 	OnConflictExpr *onconflict = root->parse->onConflict;
 
 	/* Iteration state */
+	RangeTblEntry *rte;
 	Relation	relation;
-	Oid			relationObjectId;
 	Oid			indexOidFromConstraint = InvalidOid;
 	List	   *indexList;
 	ListCell   *l;
@@ -620,10 +608,9 @@ infer_arbiter_indexes(PlannerInfo *root)
 	 * the rewriter or when expand_inherited_rtentry() added it to the query's
 	 * rangetable.
 	 */
-	relationObjectId = rt_fetch(root->parse->resultRelation,
-								root->parse->rtable)->relid;
+	rte = rt_fetch(root->parse->resultRelation, root->parse->rtable);
 
-	relation = table_open(relationObjectId, NoLock);
+	relation = table_open(rte->relid, NoLock);
 
 	/*
 	 * Build normalized/BMS representation of plain indexed attributes, as
@@ -695,7 +682,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 * enforcement needs to occur there anyway when an inference clause is
 		 * omitted.
 		 */
-		idxRel = index_open(indexoid, RowExclusiveLock);
+		Assert(rte->rellockmode >= RowExclusiveLock);
+		idxRel = index_open(indexoid, rte->rellockmode);
 		idxForm = idxRel->rd_index;
 
 		if (!idxForm->indisvalid)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ecffffb..39e1eca 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			 * necessarily on the index.
 			 */
 			heapRel = table_open(rte->relid, NoLock);
-			indexRel = index_open(index->indexoid, AccessShareLock);
+
+			/*
+			 * We use the same lock level as the relation as it may have
+			 * already been locked with that level.  Using the same lock level
+			 * can save a trip to the shared lock manager.
+			 */
+			Assert(rte->rellockmode != NoLock);
+			indexRel = index_open(index->indexoid, rte->rellockmode);
 
 			/* extract index key information from the index's pg_index info */
 			indexInfo = BuildIndexInfo(indexRel);
@@ -5305,7 +5312,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			/* Clean everything up */
 			ExecDropSingleTupleTableSlot(slot);
 
-			index_close(indexRel, AccessShareLock);
+			index_close(indexRel, NoLock);
 			table_close(heapRel, NoLock);
 
 			MemoryContextSwitchTo(oldcontext);
@@ -6444,6 +6451,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 				double *indexPages)
 {
 	IndexOptInfo *index = path->indexinfo;
+	RangeTblEntry *rte = root->simple_rte_array[path->path.parent->relid];
 	List	   *indexQuals = get_quals_from_indexclauses(path->indexclauses);
 	List	   *selectivityQuals;
 	double		numPages = index->pages,
@@ -6472,9 +6480,15 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 */
 	if (!index->hypothetical)
 	{
-		indexRel = index_open(index->indexoid, AccessShareLock);
+		/*
+		 * We use the same lock level as the relation as it may have already
+		 * been locked with that level.  Using the same lock level can save a
+		 * trip to the shared lock manager.
+		 */
+		Assert(rte->rellockmode != NoLock);
+		indexRel = index_open(index->indexoid, rte->rellockmode);
 		ginGetStats(indexRel, &ginStats);
-		index_close(indexRel, AccessShareLock);
+		index_close(indexRel, NoLock);
 	}
 	else
 	{
@@ -6781,11 +6795,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 							  &spc_seq_page_cost);
 
 	/*
-	 * Obtain some data from the index itself.
+	 * Obtain some data from the index itself.  We use the same lock level as
+	 * the relation as it may have already been locked with that level.  Using
+	 * the same lock level can save a trip to the shared lock manager.
 	 */
-	indexRel = index_open(index->indexoid, AccessShareLock);
+	Assert(rte->rellockmode != NoLock);
+	indexRel = index_open(index->indexoid, rte->rellockmode);
 	brinGetStats(indexRel, &statsData);
-	index_close(indexRel, AccessShareLock);
+	index_close(indexRel, NoLock);
 
 	/*
 	 * Compute index correlation
#30David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#29)
1 attachment(s)
Re: Inadequate executor locking of indexes

On Fri, 5 Apr 2019 at 02:22, David Rowley <david.rowley@2ndquadrant.com> wrote:

v2 attached.

Wrong patch. Here's what I meant to send.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

use_rellockmode_for_indexes_too_v3.patchapplication/octet-stream; name=use_rellockmode_for_indexes_too_v3.patchDownload
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3b23de9..0deaf8f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -664,6 +664,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
  *
  *		Detect whether a relation (identified by rangetable index)
  *		is one of the target relations of the query.
+ *
+ * Note: This is currently no longer used in core.  We keep it
+ * around to allow FDWs to use it in order to determine if the foreign
+ * table is the target of an UPDATE/DELETE.
  * ----------------------------------------------------------------
  */
 bool
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index bd837d3..604f4f1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,7 +211,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -260,16 +260,9 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 2d954b7..7711728 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,7 +493,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 	TupleDesc	tupDesc;
 
 	/*
@@ -556,16 +556,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8f39cc2..399ac01 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -901,7 +901,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	LOCKMODE	lockmode;
 
 	/*
 	 * create state structure
@@ -964,16 +964,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return indexstate;
 
-	/*
-	 * Open the index relation.
-	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
-	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
-	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c430189..15fe132 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6285,6 +6285,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	/* Build RelOptInfo */
 	rel = build_simple_rel(root, 1, NULL);
 
+	/* Already locked by the caller */
 	heap = table_open(tableOid, NoLock);
 	index = index_open(indexOid, NoLock);
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 31a3784..c01dbb8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -161,25 +161,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	if (hasindex)
 	{
+		RangeTblEntry *rte;
 		List	   *indexoidlist;
 		ListCell   *l;
-		LOCKMODE	lmode;
 
+		rte = root->simple_rte_array[varno];
 		indexoidlist = RelationGetIndexList(relation);
 
-		/*
-		 * For each index, we get the same type of lock that the executor will
-		 * need, and do not release it.  This saves a couple of trips to the
-		 * shared lock manager while not creating any real loss of
-		 * concurrency, because no schema changes could be happening on the
-		 * index while we hold lock on the parent rel, and neither lock type
-		 * blocks any other kind of index operation.
-		 */
-		if (rel->relid == root->parse->resultRelation)
-			lmode = RowExclusiveLock;
-		else
-			lmode = AccessShareLock;
-
 		foreach(l, indexoidlist)
 		{
 			Oid			indexoid = lfirst_oid(l);
@@ -194,7 +182,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			/*
 			 * Extract info from the relation descriptor for the index.
 			 */
-			indexRelation = index_open(indexoid, lmode);
+			indexRelation = index_open(indexoid, rte->rellockmode);
 			index = indexRelation->rd_index;
 
 			/*
@@ -592,8 +580,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 	OnConflictExpr *onconflict = root->parse->onConflict;
 
 	/* Iteration state */
+	RangeTblEntry *rte;
 	Relation	relation;
-	Oid			relationObjectId;
 	Oid			indexOidFromConstraint = InvalidOid;
 	List	   *indexList;
 	ListCell   *l;
@@ -620,10 +608,9 @@ infer_arbiter_indexes(PlannerInfo *root)
 	 * the rewriter or when expand_inherited_rtentry() added it to the query's
 	 * rangetable.
 	 */
-	relationObjectId = rt_fetch(root->parse->resultRelation,
-								root->parse->rtable)->relid;
+	rte = rt_fetch(root->parse->resultRelation, root->parse->rtable);
 
-	relation = table_open(relationObjectId, NoLock);
+	relation = table_open(rte->relid, NoLock);
 
 	/*
 	 * Build normalized/BMS representation of plain indexed attributes, as
@@ -695,7 +682,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 * enforcement needs to occur there anyway when an inference clause is
 		 * omitted.
 		 */
-		idxRel = index_open(indexoid, RowExclusiveLock);
+		Assert(rte->rellockmode >= RowExclusiveLock);
+		idxRel = index_open(indexoid, rte->rellockmode);
 		idxForm = idxRel->rd_index;
 
 		if (!idxForm->indisvalid)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ecffffb..23bfb71 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5187,11 +5187,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 
 			/*
 			 * Open the table and index so we can read from them.  We should
-			 * already have at least AccessShareLock on the table, but not
-			 * necessarily on the index.
+			 * already have at least AccessShareLock on the table.  The index
+			 * should have been locked in get_relation_info.
 			 */
 			heapRel = table_open(rte->relid, NoLock);
-			indexRel = index_open(index->indexoid, AccessShareLock);
+			indexRel = index_open(index->indexoid, NoLock);
 
 			/* extract index key information from the index's pg_index info */
 			indexInfo = BuildIndexInfo(indexRel);
@@ -5305,7 +5305,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			/* Clean everything up */
 			ExecDropSingleTupleTableSlot(slot);
 
-			index_close(indexRel, AccessShareLock);
+			index_close(indexRel, NoLock);
 			table_close(heapRel, NoLock);
 
 			MemoryContextSwitchTo(oldcontext);
@@ -6472,9 +6472,10 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 */
 	if (!index->hypothetical)
 	{
-		indexRel = index_open(index->indexoid, AccessShareLock);
+		/* Lock should have already been obtained in get_relation_info */
+		indexRel = index_open(index->indexoid, NoLock);
 		ginGetStats(indexRel, &ginStats);
-		index_close(indexRel, AccessShareLock);
+		index_close(indexRel, NoLock);
 	}
 	else
 	{
@@ -6781,11 +6782,12 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 							  &spc_seq_page_cost);
 
 	/*
-	 * Obtain some data from the index itself.
+	 * Obtain some data from the index itself.  A lock should have already
+	 * been obtained on the index during get_relation_info.
 	 */
-	indexRel = index_open(index->indexoid, AccessShareLock);
+	indexRel = index_open(index->indexoid, NoLock);
 	brinGetStats(indexRel, &statsData);
-	index_close(indexRel, AccessShareLock);
+	index_close(indexRel, NoLock);
 
 	/*
 	 * Compute index correlation
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#30)
Re: Inadequate executor locking of indexes

David Rowley <david.rowley@2ndquadrant.com> writes:

Wrong patch. Here's what I meant to send.

Pushed with some minor tweaking, mostly comments.

Some notes:

* We now have a general convention that queries always take the same lock
type on indexes as on their parent tables, but that convention is not
respected by index DDL. I'm not sure if there is any additional cleanup
that should be done there. The DDL code intends to block concurrent
execution of other DDL on the same index, in most cases, so it may be
fine. At the very least, the different lock levels for those cases are
clearly intentional, while I don't think that was true for DML.

* I dropped the extra assertion you'd added to infer_arbiter_indexes,
as it didn't seem necessary or appropriate. That function's just
interested in inspecting the index, so it doesn't need to assume anything
about how strong the lock is. I think the comment that was there was
just trying to justify the shortcut of hard-wiring the lock level as
RowExclusiveLock.

* I went ahead and changed gin_clean_pending_list() to take
RowExclusiveLock not AccessShareLock on its target index. I'm not quite
sure that AccessShareLock is an actual bug there; it may be that GIN's
internal conventions are such that that's safe. But it sure seemed darn
peculiar, and at risk of causing future problems.

regards, tom lane

#32David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#31)
Re: Inadequate executor locking of indexes

On Fri, 5 Apr 2019 at 08:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pushed with some minor tweaking, mostly comments.

Thanks for tweaking and pushing this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#33Andres Freund
andres@anarazel.de
In reply to: David Rowley (#2)
Re: Inadequate executor locking of indexes

Hi,

On 2018-11-23 17:41:26 +1300, David Rowley wrote:

Ideally, the locking code would realise we already hold a stronger
lock and skip the lock, but I don't see how that's realistically
possible without probing the hash table for all stronger lock types
first, which would likely damage the performance of locking.

That seems kind of a self-made problem to me. I don't think there's
anything really forcing us to have the locallock hashtable contain the
lockmode. It'd not be a trivial change, but we could have the locallock
entry have enough information to allow us to avoid hitting the shared
table if we hold a stronger lock already. The biggest complexity
probably would be that we'd need code to downgrade the shared lock we
currently hold, when the more heavyweight lock is released.

This made me look at LockAcquireExtended() just now. When we acquire a
lock that's weaker than one already held, and there's another backend
waiting for a conflicting lock, that only works if NOWAIT isn't
used. That's because only ProcSleep() gets around to checking whether we
already hold a stronger lock, but LockAcquireExtended() bails out for
NOWAIT long before that. None of that is documented in
LockAcquireExtended(). Isn't that a bit weird?

Greetings,

Andres Freund