Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

Started by Rushabh Lathiaabout 7 years ago6 messages

rushabh.lathia@gmail.com

about 7 years ago

Hi,

Commit b04aeb0a053e7cf7faad89f7d47844d8ba0dc839 add assertions that we hold
some relevant lock during relation open as opening a relation with no lock
at all is unsafe;

With above commit below test case is failing and hitting the newly added
Assert().

Test case:
===========

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Call stack:
=============

#0 0x00007f230b2d61d7 in raise () from /lib64/libc.so.6
#1 0x00007f230b2d78c8 in abort () from /lib64/libc.so.6
#2 0x0000000000a26ce5 in ExceptionalCondition (
conditionName=0xabdca8 "!(lockmode != 0 || (Mode ==
BootstrapProcessing) || CheckRelationLockedByMe(r, 1, 1))",
errorType=0xabc529 "FailedAssertion",
fileName=0xabc668 "heapam.c", lineNumber=1146) at assert.c:54
#3 0x00000000004dacd8 in relation_open (relationId=16387, lockmode=0) at
heapam.c:1144
#4 0x00000000004f94bf in index_open (relationId=16387, lockmode=0) at
indexam.c:155
#5 0x0000000000701403 in ExecInitIndexScan (node=0x1cf8fd8,
estate=0x1cec418, eflags=0) at nodeIndexscan.c:994
#6 0x00000000006dde31 in ExecInitNode (node=0x1cf8fd8, estate=0x1cec418,
eflags=0) at execProcnode.c:217
#7 0x000000000070d7b8 in ExecInitModifyTable (node=0x1cf8da8,
estate=0x1cec418, eflags=0) at nodeModifyTable.c:2190
#8 0x00000000006ddd39 in ExecInitNode (node=0x1cf8da8, estate=0x1cec418,
eflags=0) at execProcnode.c:174
#9 0x00000000006d48a5 in InitPlan (queryDesc=0x1cdc378, eflags=0) at
execMain.c:1024
#10 0x00000000006d36e9 in standard_ExecutorStart (queryDesc=0x1cdc378,
eflags=0) at execMain.c:265
#11 0x00000000006d3485 in ExecutorStart (queryDesc=0x1cdc378, eflags=0) at
execMain.c:147
#12 0x0000000000725880 in _SPI_pquery (queryDesc=0x1cdc378,
fire_triggers=true, tcount=0) at spi.c:2469
#13 0x00000000007252d4 in _SPI_execute_plan (plan=0x1cc4748,
paramLI=0x1ce23c8, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false,
fire_triggers=true,
tcount=0) at spi.c:2235
#14 0x00000000007222a1 in SPI_execute_plan_with_paramlist (plan=0x1cc4748,
params=0x1ce23c8, read_only=false, tcount=0) at spi.c:516
#15 0x00007f23004a5f9a in exec_stmt_execsql (estate=0x7fff0bcaa4d0,
stmt=0x1cc78f0) at pl_exec.c:4114

Here DELETE statement pick the IndexScan plan, and later during execution
when it tries to open index relation, ExecInitIndexScan() assumes that index
relation should already hold require a lock. But in this case, that's not
the
case - which got exposed through the newly added ASSERT() for NoLock.

When query get's execute first time build_simple_rel() ->
get_relation_info(),
takes a lock on the indexes of a relation (this do happen from the
BuildCachePlan()). When we execute the same query second time, a plan is
caches
and CheckCachePlan() calls the AquireExecutorLocks() take the require locks
needed for execution of the caches plan. Here it takes a lock on the rtable
list - but doesn't take any lock on the index relation.

I observed that ExecInitModifyTable() to open the index relation if indices
on
the result relation. But here also - it doesn't do anything for CMD_DELETE:

/*
* If there are indices on the result relation, open them and save
* descriptors in the result relation info, so that we can add new
* index entries for the tuples we add/update. We need not do this
* for a DELETE, however, since deletion doesn't affect indexes.
Also,
* inside an EvalPlanQual operation, the indexes might be open
* already, since we share the resultrel state with the original
* query.
*/
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.

Thoughts/Comments?

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

David Rowley

david.rowley@2ndquadrant.com

about 7 years ago

In reply to: Rushabh Lathia (#1)

1 attachment(s)

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.

I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

fix_index_locking_for_deletes.patchapplication/octet-stream; name=fix_index_locking_for_deletes.patchDownload

diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f209173a85..d7c69867ea 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -920,7 +920,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	bool		indexlocked;
 
 	/*
 	 * create state structure
@@ -986,13 +986,14 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	/*
 	 * Open the index relation.
 	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have obtained
+	 * a lock on the index, otherwise we must obtain a read lock here.
 	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
+	indexlocked = estate->es_plannedstmt->commandType != CMD_DELETE &&
+				  ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
 	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+											  indexlocked ? NoLock : AccessShareLock);
 
 	/*
 	 * Initialize index-specific scan state

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: David Rowley (#2)

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

David Rowley <david.rowley@2ndquadrant.com> writes:

On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

I complained about this previously in

/messages/by-id/19465.1541636036@sss.pgh.pa.us

but got nothing but crickets :-(

regards, tom lane

Rushabh Lathia

rushabh.lathia@gmail.com

about 7 years ago

In reply to: David Rowley (#2)

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com>
wrote:

On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take

AccessShareLock

on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.

+1.

I tried the same approach and with further testing I haven't found
any issues.

I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Rushabh Lathia

Rushabh Lathia

rushabh.lathia@gmail.com

about 7 years ago

In reply to: Rushabh Lathia (#4)

1 attachment(s)

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

While looking code further around this, I realized that we need
similar kind of fix for bitmap as well as index only scan as well.

Here is the patch, which does similar fix for bitmap and indexonly
scans.

Thanks,

On Fri, Nov 23, 2018 at 6:47 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

On Fri, Nov 23, 2018 at 3:33 AM David Rowley <david.rowley@2ndquadrant.com>
wrote:

On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
DELETE FROM foo where x = a;
return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
f1
----
0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly

Now, to fix this issue either we need to hold proper lock before

reaching

to ExecInitIndexScan() or teach ExecInitIndexScan() to take

AccessShareLock

on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here. Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.

+1.

I tried the same approach and with further testing I haven't found
any issues.

I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Rushabh Lathia

--
Rushabh Lathia

Attachments:

fix_index_locking_for_deletes_v2.patchtext/x-patch; charset=US-ASCII; name=fix_index_locking_for_deletes_v2.patchDownload

diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index d04f490..c615316 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -210,7 +210,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	bool		indexlocked;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -266,9 +266,17 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	 * InitPlan already opened and write-locked the index, so we can avoid
 	 * taking another lock here.  Otherwise we need a normal reader's lock.
 	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
+	/*
+	 * Open the index relation.
+	 *
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have obtained
+	 * a lock on the index, otherwise we must obtain a read lock here.
+	 */
+	indexlocked = estate->es_plannedstmt->commandType != CMD_DELETE &&
+				  ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
 	indexstate->biss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+											   indexlocked ? NoLock : AccessShareLock);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index d1201a1..d9caf00 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,8 +493,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
 	TupleDesc	tupDesc;
+	bool		indexlocked;
 
 	/*
 	 * create state structure
@@ -558,13 +558,14 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	/*
 	 * Open the index relation.
 	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have obtained
+	 * a lock on the index, otherwise we must obtain a read lock here.
 	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
+	indexlocked = estate->es_plannedstmt->commandType != CMD_DELETE &&
+				  ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
 	indexstate->ioss_RelationDesc = index_open(node->indexid,
-											   relistarget ? NoLock : AccessShareLock);
+											   indexlocked ? NoLock : AccessShareLock);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f209173..d7c6986 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -920,7 +920,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
+	bool		indexlocked;
 
 	/*
 	 * create state structure
@@ -986,13 +986,14 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	/*
 	 * Open the index relation.
 	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have obtained
+	 * a lock on the index, otherwise we must obtain a read lock here.
 	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
+	indexlocked = estate->es_plannedstmt->commandType != CMD_DELETE &&
+				  ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
 	indexstate->iss_RelationDesc = index_open(node->indexid,
-											  relistarget ? NoLock : AccessShareLock);
+											  indexlocked ? NoLock : AccessShareLock);
 
 	/*
 	 * Initialize index-specific scan state

Amit Kapila

amit.kapila16@gmail.com

about 7 years ago

In reply to: Rushabh Lathia (#5)

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

On Fri, Dec 14, 2018 at 11:32 AM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

While looking code further around this, I realized that we need
similar kind of fix for bitmap as well as index only scan as well.

I think if we want to move forward with this patch, we need to first
investigate the deadlock hazard mentioned by Tom in other related
thread [1]/messages/by-id/3046.1542990312@sss.pgh.pa.us. I and others have also responded on that thread, see if
you can respond to it.

[1]: /messages/by-id/3046.1542990312@sss.pgh.pa.us

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com