Parallel Inserts in CREATE TABLE AS
Hi,
The idea of this patch is to allow the leader and each worker insert the
tuples in parallel if the SELECT part of the CTAS is parallelizable. Along
with the parallel inserts, if the CTAS code path is allowed to do
table_multi_insert()[1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com, then the gain we achieve is as follows:
For a table with 2 integer columns, 100million tuples(more testing results
are at [2]Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement). case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X) case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X) case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)), the exec time on the HEAD is *120sec*, where as with the
parallelism patch proposed here and multi insert patch [1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com, with 3 workers
and leader participation the exec time is *22sec(5.45X)*. With the current
CTAS code which does single tuple insert(see intorel_receive()), the
performance gain is limited to ~1.7X with parallelism. This is due to the
fact that the workers contend more for locks on buffer pages while
extending the table. So, the maximum benefit we could get for CTAS is with
both parallelism and multi tuple inserts.
The design:
Let the planner know that the SELECT is from CTAS in createas.c so that it
can set the number of tuples transferred from the workers to Gather node to
0. With this change, there are chances that the planner may choose the
parallel plan. After the planning, check if the upper plan node is Gather
in createas.c and mark a parallelism flag in the CTAS dest receiver. Pass
the into clause, object id, command id from the leader to workers, so that
each worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each worker writes
atomically it's number of inserted tuples into a shared memory variable,
the leader combines this with it's own number of inserted tuples and shares
to the client.
Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
2. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having some
textual info along with the Gather node?
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954
width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------
3. Need to restrict parallel inserts, if CTAS tries to create temp/global
tables as the workers will not have access to those tables. Need to analyze
whether to allow parallelism if CTAS has prepared statements or with no
data.
4. Need to stop unnecessary parallel shared state such as tuple queue being
created and shared to workers.
5. Addition of new test cases. Testing with more scenarios and different
data sets, sizes, tablespaces, select into. Analysis on the 2 mismatches in
write_parallel.sql regression test.
Thoughts?
Credits:
1. Thanks to DIlip Kumar for the main design idea and the discussions.
Thanks to Vignesh for the discussions.
2. Patch development, testing is by me.
3. Thanks to the authors of table_multi_insert() in CTAS patch [1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com.
[1]: For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
available at
/messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
[2]: Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement). case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X) case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X) case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
participation,with default postgresql.conf file. All readings are of
triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts -
(0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts -
(0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts -
(0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v1-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 9e45426a6d4d6f030ba24ed58eb0e2ff5912a972 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 20 Sep 2020 09:23:06 +0530
Subject: [PATCH v1] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 4 +-
src/backend/commands/createas.c | 284 +++++++++++++++-----------
src/backend/commands/explain.c | 23 +++
src/backend/executor/execMain.c | 21 ++
src/backend/executor/execParallel.c | 66 +++++-
src/backend/executor/nodeGather.c | 66 ++++++
src/backend/optimizer/path/costsize.c | 12 ++
src/include/commands/createas.h | 15 ++
src/include/executor/execParallel.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/nodes/parsenodes.h | 1 +
11 files changed, 374 insertions(+), 125 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861a02..50766f489a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2049,10 +2049,10 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
- if (IsParallelWorker())
+ /*if (IsParallelWorker())
ereport(ERROR,
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
+ errmsg("cannot insert tuples in a parallel worker")));*/
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d53ec952d0..4812a55518 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,10 +316,29 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for
+ * CREATE TABLE AS. This is used to make the number tuples
+ * transferred from workers to gather node(in case parallelism
+ * kicks in for the SELECT part of the CTAS), to zero as each
+ * worker will parallelly insert it's share of tuples.
+ */
+ if (!is_matview)
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (!is_matview &&
+ IsA(plan->planTree, Gather))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
@@ -418,6 +425,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,135 +440,167 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- char relkind;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- RangeTblEntry *rte;
- ListCell *lc;
- int attnum;
- Assert(into != NULL); /* else somebody forgot to set it */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+ }
+ else
+ {
+ IntoClause *into = myState->into;
+ bool is_matview;
+ char relkind;
+ List *attrList;
+ Relation intoRelationDesc;
+ RangeTblEntry *rte;
+ ListCell *lc;
+ int attnum;
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
- relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
+ Assert(into != NULL); /* else somebody forgot to set it */
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
- {
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
+ is_matview = (into->viewQuery != NULL);
+ relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info. If
+ * a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK, too
+ * many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
- }
- else
- colname = NameStr(attribute->attname);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
- /*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
- */
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
- attrList = lappend(attrList, col);
- }
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must check
+ * this here because DefineRelation would adopt the type's default
+ * collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ attrList = lappend(attrList, col);
+ }
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Actually create the target table
+ */
+ intoRelationAddr = create_ctas_internal(attrList, into);
- /*
- * Check INSERT permission on the constructed table.
- *
- * XXX: It would arguably make sense to skip this check if into->skipData
- * is true.
- */
- rte = makeNode(RangeTblEntry);
- rte->rtekind = RTE_RELATION;
- rte->relid = intoRelationAddr.objectId;
- rte->relkind = relkind;
- rte->rellockmode = RowExclusiveLock;
- rte->requiredPerms = ACL_INSERT;
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
- rte->insertedCols = bms_add_member(rte->insertedCols,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ /*
+ * Check INSERT permission on the constructed table.
+ *
+ * XXX: It would arguably make sense to skip this check if into->skipData
+ * is true.
+ */
+ rte = makeNode(RangeTblEntry);
+ rte->rtekind = RTE_RELATION;
+ rte->relid = intoRelationAddr.objectId;
+ rte->relkind = relkind;
+ rte->rellockmode = RowExclusiveLock;
+ rte->requiredPerms = ACL_INSERT;
+
+ for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
+ rte->insertedCols = bms_add_member(rte->insertedCols,
+ attnum - FirstLowInvalidHeapAttributeNumber);
+
+ ExecCheckRTPerms(list_make1(rte), true);
- ExecCheckRTPerms(list_make1(rte), true);
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has requested
+ * something invalid, and otherwise will return RLS_ENABLED if RLS should
+ * be enabled here. We don't actually support that currently, so throw
+ * our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * Tentatively mark the target as populated, if it's a matview and we're
+ * going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->bistate = GetBulkInsertState();
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
- myState->bistate = GetBulkInsertState();
+ /*
+ * Valid smgr_targblock implies something already wrote to the relation.
+ * This may be harmless, but this function hasn't planned for it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ if (myState->is_parallel == true)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+ }
}
/*
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5547..5a9ab2ce39 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,18 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for
+ * CREATE TABLE AS. This is used to make the number tuples
+ * transferred from workers to gather node(in case parallelism
+ * kicks in for the SELECT part of the CTAS), to zero as each
+ * worker will parallelly insert it's share of tuples.
+ */
+ if (into != NULL &&
+ into->type == T_IntoClause &&
+ into->viewQuery == NULL)
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -536,7 +548,18 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* AS, we'd better use the appropriate tuple receiver.
*/
if (into)
+ {
dest = CreateIntoRelDestReceiver(into);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (into->type == T_IntoClause &&
+ into->viewQuery == NULL &&
+ IsA(plannedstmt->planTree, Gather))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
else
dest = None_Receiver;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26ba4..ceb45d194d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,26 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each
+ * parallel worker inerst it's tuples, we must send
+ * information such as intoclause(for each worker
+ * building it's own dest receiver), object id(for each
+ * worker to open the table), cid(command id for each
+ * worker to insert properly) from leader to workers.
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded == true &&
+ dest != NULL &&
+ dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel == true &&
+ ((DR_intorel *) dest)->is_parallel_worker != true)
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->cid = ((DR_intorel *) dest)->output_cid;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..c0143d4e0e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ CommandId cid; /* workers to insert appropriately. */
+ Oid objectid; /* workers to open relation/table. */
+ pg_atomic_uint64 processed; /* number tuples inserted by all the workers. */
} FixedParallelExecutorState;
/*
@@ -601,6 +606,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -713,6 +719,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (planstate->intoclause != NULL &&
+ planstate->intoclause->type == T_IntoClause)
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -730,6 +745,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr != NULL &&
+ planstate->objectid != InvalidOid &&
+ planstate->cid != InvalidCommandId)
+ {
+ fpes->objectid = planstate->objectid;
+ fpes->cid = planstate->cid;
+ }
+ else
+ {
+ fpes->objectid = InvalidOid;
+ fpes->cid = InvalidCommandId;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -759,6 +790,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1388,12 +1426,29 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr != NULL)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use
+ * the proper dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ ((DR_intorel *)receiver)->output_cid = fpes->cid;
+ }
+ else
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1472,6 +1527,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader
+ * will use it to inform to the end client.
+ */
+ if (intoclausestr != NULL)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..bda90a23f9 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -166,6 +166,18 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (node->ps.intoclause != NULL &&
+ node->ps.intoclause->type == T_IntoClause)
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ node->ps.lefttree->cid = node->ps.cid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -220,6 +232,60 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (node->ps.intoclause != NULL &&
+ node->ps.intoclause->type == T_IntoClause)
+ {
+ /*
+ * By now, for parallel workers (if launched any),
+ * would have started their work i.e. insertion to
+ * target table.
+ * In case the leader is chosen to participate for
+ * parallel inserts in CTAS, then finish it's share
+ * before going to wait for the parallel workers to
+ * finish.
+ */
+ if (node->need_to_scan_locally == true &&
+ node->ps.dest != NULL &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if (!TupIsNull(outerTupleSlot))
+ {
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+ node->ps.state->es_processed++;
+ }
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+ }
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+ /*
+ * Add up the total tuples inserted by all workers, to
+ * the tuples inserted by the leader(if any). This will
+ * be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index cd3716d494..9ed671415b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -377,6 +377,18 @@ cost_gather(GatherPath *path, PlannerInfo *root,
else
path->path.rows = rel->rows;
+ /*
+ * Make the number of tuples that are transferred from
+ * workers to gather/leader node zero as each worker
+ * parallelly insert the tuples that are resulted from
+ * it's chunk of plan execution. This change may make
+ * the parallel plan cheap among all other plans, and
+ * influence the planner to consdier this parallel plan.
+ */
+ if (root->parse->isForCTAS &&
+ root->query_level == 1)
+ path->path.rows = 0;
+
startup_cost = path->subpath->startup_cost;
run_cost = path->subpath->total_cost - path->subpath->startup_cost;
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..1564687efe 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,27 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* true if parallelism is to be considered */
+ bool is_parallel_worker; /* true for parallel worker */
+ Oid object_id; /* used for table open by parallel worker */
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..77f69946bf 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,7 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ volatile pg_atomic_uint64 *processed; /* number of tuples inserted by all workers */
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a5ab1aed14..478ffeb74c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1025,6 +1026,11 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Below is parallel inserts in CTAS related info. */
+ IntoClause *intoclause;
+ Oid objectid;
+ CommandId cid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 60c2f45466..fe43dc941e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* true if the select query is for create table as */
} Query;
--
2.25.1
[1] - For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
[2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
I feel this enhancement could give good improvement, +1 for this.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Hi,
On 2020-09-23 17:20:20 +0530, Bharath Rupireddy wrote:
The idea of this patch is to allow the leader and each worker insert the
tuples in parallel if the SELECT part of the CTAS is parallelizable.
Cool!
The design:
I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.
E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table. But how does that work for SELECT INTO? Are you prohibiting
that? ...
Pass the into clause, object id, command id from the leader to
workers, so that each worker can create its own CTAS dest
receiver. Leader inserts it's share of tuples if instructed to do, and
so are workers. Each worker writes atomically it's number of inserted
tuples into a shared memory variable, the leader combines this with
it's own number of inserted tuples and shares to the client.Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
I have mentioned before that I think it'd be good if we changed the
insert APIs to have a more 'scan' like structure. I am thinking of
something like
TableInsertScan* table_begin_insert(Relation);
table_tuple_insert(TableInsertScan *is, other, args);
table_multi_insert(TableInsertScan *is, other, args);
table_end_insert(TableInsertScan *);
that'd then replace the BulkInsertStateData logic we have right now. But
more importantly it'd allow an AM to optimize operations across multiple
inserts, which is important for column stores.
And for the purpose of your question, we could then have a
table_insert_allow_parallel(TableInsertScan *);
or an additional arg to table_begin_insert().
3. Need to restrict parallel inserts, if CTAS tries to create temp/global
tables as the workers will not have access to those tables. Need to analyze
whether to allow parallelism if CTAS has prepared statements or with no
data.
In which case does CTAS not create a table? You definitely need to
ensure that the table is created before your workers are started, and
there needs to be in a different CommandId.
Greetings,
Andres Freund
Thanks Andres for the comments.
On Thu, Sep 24, 2020 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
The design:
I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table.
Yes we do a bunch of catalog changes related to the created new table.
We will have both the txn id and command id assigned when catalogue
changes are being made. But, right after the table is created in the
leader, the command id is incremented (CommandCounterIncrement() is
called from create_ctas_internal()) whereas the txn id remains the
same. The new command id is marked as GetCurrentCommandId(true); in
intorel_startup, then the parallel mode is entered. The txn id and
command id are serialized into parallel DSM, they are then available
to all parallel workers. This is discussed in [1]/messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com.
Few changes I have to make in the parallel worker code: set
currentCommandIdUsed = true;, may be via a common API
SetCurrentCommandIdUsedForWorker() proposed in [1]/messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com and remove the
extra command id sharing from the leader to workers.
I will add a few comments in the upcoming patch related to the above info.
But how does that work for SELECT INTO? Are you prohibiting
that? ...
In case of SELECT INTO, a new table gets created and I'm not
prohibiting the parallel inserts and I think we don't need to.
Thoughts?
Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.And for the purpose of your question, we could then have a
table_insert_allow_parallel(TableInsertScan *);
or an additional arg to table_begin_insert().
Removing "cannot insert tuples in a parallel worker" restriction from
heap_prepare_insert() is a common problem for parallel inserts in
general, i.e. parallel inserts in CTAS, parallel INSERT INTO
SELECTs[1]/messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com and parallel copy[2]/messages/by-id/CAA4eK1+kpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV=g@mail.gmail.com. It will be good if a common solution
is agreed.
3. Need to restrict parallel inserts, if CTAS tries to create temp/global
tables as the workers will not have access to those tables. Need to analyze
whether to allow parallelism if CTAS has prepared statements or with no
data.In which case does CTAS not create a table?
AFAICS, the table gets created in all the cases but the insertion of
the data gets skipped if the user specifies "with no data" option in
which case the select part is not even planned, and so the parallelism
will also not be picked.
You definitely need to
ensure that the table is created before your workers are started, and
there needs to be in a different CommandId.
Yeah, this is already being done. Table gets created in the
leader(intorel_startup which gets called from dest->rStartup(dest in
standard_ExecutorRun()) before entering the parallel mode.
[1]: /messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com
[2]: /messages/by-id/CAA4eK1+kpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV=g@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 28, 2020 at 3:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks Andres for the comments.
On Thu, Sep 24, 2020 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
The design:
I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table.Yes we do a bunch of catalog changes related to the created new table.
We will have both the txn id and command id assigned when catalogue
changes are being made. But, right after the table is created in the
leader, the command id is incremented (CommandCounterIncrement() is
called from create_ctas_internal()) whereas the txn id remains the
same. The new command id is marked as GetCurrentCommandId(true); in
intorel_startup, then the parallel mode is entered. The txn id and
command id are serialized into parallel DSM, they are then available
to all parallel workers. This is discussed in [1].Few changes I have to make in the parallel worker code: set
currentCommandIdUsed = true;, may be via a common API
SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
extra command id sharing from the leader to workers.I will add a few comments in the upcoming patch related to the above info.
Yes, that would be good.
But how does that work for SELECT INTO? Are you prohibiting
that? ...In case of SELECT INTO, a new table gets created and I'm not
prohibiting the parallel inserts and I think we don't need to.
So, in this case, also do we ensure that table is created before we
launch the workers. If so, I think you can explain in comments about
it and what you need to do that to ensure the same.
While skimming through the patch, a small thing I noticed:
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (!is_matview &&
+ IsA(plan->planTree, Gather))
+ ((DR_intorel *) dest)->is_parallel = true;
+
I am not sure at this stage if this is the best way to make CTAS as
parallel but if so, then probably you can expand the comments a bit to
say why you consider only Gather node (and that too when it is the
top-most node) and why not another parallel node like GatherMerge?
Thoughts?
Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.And for the purpose of your question, we could then have a
table_insert_allow_parallel(TableInsertScan *);
or an additional arg to table_begin_insert().Removing "cannot insert tuples in a parallel worker" restriction from
heap_prepare_insert() is a common problem for parallel inserts in
general, i.e. parallel inserts in CTAS, parallel INSERT INTO
SELECTs[1] and parallel copy[2]. It will be good if a common solution
is agreed.
Right, for now, I think you can simply remove that check from the code
instead of just commenting it. We will see if there is a better
check/Assert we can add there.
--
With Regards,
Amit Kapila.
On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yes we do a bunch of catalog changes related to the created new table.
We will have both the txn id and command id assigned when catalogue
changes are being made. But, right after the table is created in the
leader, the command id is incremented (CommandCounterIncrement() is
called from create_ctas_internal()) whereas the txn id remains the
same. The new command id is marked as GetCurrentCommandId(true); in
intorel_startup, then the parallel mode is entered. The txn id and
command id are serialized into parallel DSM, they are then available
to all parallel workers. This is discussed in [1].Few changes I have to make in the parallel worker code: set
currentCommandIdUsed = true;, may be via a common API
SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
extra command id sharing from the leader to workers.I will add a few comments in the upcoming patch related to the above info.
Yes, that would be good.
Added comments.
But how does that work for SELECT INTO? Are you prohibiting
that? ...In case of SELECT INTO, a new table gets created and I'm not
prohibiting the parallel inserts and I think we don't need to.So, in this case, also do we ensure that table is created before we
launch the workers. If so, I think you can explain in comments about
it and what you need to do that to ensure the same.
For SELECT INTO, the table gets created by the leader in
create_ctas_internal(), then ExecInitParallelPlan() gets called which
launches the workers and then the leader(if asked to do so) and the
workers insert the rows. So, we don't need to do any extra work to
ensure the table gets created before the workers start inserting
tuples.
While skimming through the patch, a small thing I noticed: + /* + * SELECT part of the CTAS is parallelizable, so we can make + * each parallel worker insert the tuples that are resulted + * in it's execution into the target table. + */ + if (!is_matview && + IsA(plan->planTree, Gather)) + ((DR_intorel *) dest)->is_parallel = true; +I am not sure at this stage if this is the best way to make CTAS as
parallel but if so, then probably you can expand the comments a bit to
say why you consider only Gather node (and that too when it is the
top-most node) and why not another parallel node like GatherMerge?
If somebody expects to preserve the order of the tuples that are
coming from GatherMerge node of the select part in CTAS or SELECT INTO
while inserting, now if parallelism is allowed, that may not be the
case i.e. the order of insertion of tuples may vary. I'm not quite
sure, if someone wants to use order by in the select parts of CTAS or
SELECT INTO in a real world use case. Thoughts?
Right, for now, I think you can simply remove that check from the code
instead of just commenting it. We will see if there is a better
check/Assert we can add there.
Done.
I also worked on some of the open points I listed earlier in my mail.
3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access to those tables.
Done.
Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
For prepared statements, the parallelism will not be picked and so is
parallel insertion.
For CTAS with no data option case the select part is not even planned,
and so the parallelism will also not be picked.
4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
Done.
I'm listing the things that are still pending.
1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.
2. Addition of new test cases. Testing with more scenarios and
different data sets, sizes, tablespaces, select into. Analysis on the
2 mismatches in write_parallel.sql regression test.
Attaching v2 patch, thoughts and comments are welcome.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v2-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From bca61cd199970ae050dc74b4a7d2d7b275ad0ce0 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 14 Oct 2020 14:25:24 +0530
Subject: [PATCH v2] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 14 ++
src/backend/commands/createas.c | 321 ++++++++++++++++----------
src/backend/commands/explain.c | 19 ++
src/backend/executor/execMain.c | 19 ++
src/backend/executor/execParallel.c | 61 ++++-
src/backend/executor/nodeGather.c | 97 +++++++-
src/backend/optimizer/path/costsize.c | 12 +
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 18 ++
src/include/executor/execParallel.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 1 +
13 files changed, 437 insertions(+), 143 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861a02..1602525d4a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..24eb8fca38 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -775,6 +775,20 @@ GetCurrentCommandId(bool used)
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+*/
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d53ec952d0..a3ab772389 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,10 +316,28 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for
+ * CREATE TABLE AS. This is used to make the number tuples
+ * transferred from workers to gather node(in case parallelism
+ * kicks in for the SELECT part of the CTAS), to zero as each
+ * worker will parallelly insert it's share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plan))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
@@ -418,6 +424,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,135 +439,180 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- char relkind;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- RangeTblEntry *rte;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
- relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in inintorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ char relkind;
+ List *attrList;
+ Relation intoRelationDesc;
+ RangeTblEntry *rte;
+ ListCell *lc;
+ int attnum;
- if (lc)
- {
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
- }
- else
- colname = NameStr(attribute->attname);
+ Assert(into != NULL); /* else somebody forgot to set it */
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
+ is_matview = (into->viewQuery != NULL);
+ relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
- */
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ * Build column definitions using "pre-cooked" type and collation info. If
+ * a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK, too
+ * many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ {
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
- attrList = lappend(attrList, col);
- }
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must check
+ * this here because DefineRelation would adopt the type's default
+ * collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ attrList = lappend(attrList, col);
+ }
- /*
- * Check INSERT permission on the constructed table.
- *
- * XXX: It would arguably make sense to skip this check if into->skipData
- * is true.
- */
- rte = makeNode(RangeTblEntry);
- rte->rtekind = RTE_RELATION;
- rte->relid = intoRelationAddr.objectId;
- rte->relkind = relkind;
- rte->rellockmode = RowExclusiveLock;
- rte->requiredPerms = ACL_INSERT;
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
- for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
- rte->insertedCols = bms_add_member(rte->insertedCols,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ /*
+ * Actually create the target table
+ */
+ intoRelationAddr = create_ctas_internal(attrList, into);
- ExecCheckRTPerms(list_make1(rte), true);
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * Check INSERT permission on the constructed table.
+ *
+ * XXX: It would arguably make sense to skip this check if into->skipData
+ * is true.
+ */
+ rte = makeNode(RangeTblEntry);
+ rte->rtekind = RTE_RELATION;
+ rte->relid = intoRelationAddr.objectId;
+ rte->relkind = relkind;
+ rte->rellockmode = RowExclusiveLock;
+ rte->requiredPerms = ACL_INSERT;
+
+ for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
+ rte->insertedCols = bms_add_member(rte->insertedCols,
+ attnum - FirstLowInvalidHeapAttributeNumber);
+
+ ExecCheckRTPerms(list_make1(rte), true);
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has requested
+ * something invalid, and otherwise will return RLS_ENABLED if RLS should
+ * be enabled here. We don't actually support that currently, so throw
+ * our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
- myState->bistate = GetBulkInsertState();
+ /*
+ * Tentatively mark the target as populated, if it's a matview and we're
+ * going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Valid smgr_targblock implies something already wrote to the relation.
+ * This may be harmless, but this function hasn't planned for it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+
+ if (myState->is_parallel == true)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+ }
}
/*
@@ -614,3 +668,28 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, PlannedStmt *plannedstmt)
+{
+ bool allowed = false;
+
+ if (into != NULL &&
+ into->type == T_IntoClause)
+ {
+ if (into->viewQuery == NULL &&
+ into->rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (plannedstmt != NULL && allowed)
+ {
+ if (!IsA(plannedstmt->planTree, Gather))
+ allowed = false;
+ }
+ }
+
+ return allowed;
+}
\ No newline at end of file
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c8e292adfa..4cf6a3da44 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,16 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for
+ * CREATE TABLE AS. This is used to make the number tuples
+ * transferred from workers to gather node(in case parallelism
+ * kicks in for the SELECT part of the CTAS), to zero as each
+ * worker will parallelly insert it's share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -536,7 +546,16 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* AS, we'd better use the appropriate tuple receiver.
*/
if (into)
+ {
dest = CreateIntoRelDestReceiver(into);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plannedstmt))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
else
dest = None_Receiver;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 783eecbc13..7928e53295 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,24 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each
+ * parallel worker inerst it's tuples, we must send
+ * information such as intoclause(for each worker
+ * building it's own dest receiver), object id(for each
+ * worker to open the table).
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded == true &&
+ dest != NULL &&
+ dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel == true &&
+ ((DR_intorel *) dest)->is_parallel_worker != true)
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..fd6f9429d0 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,8 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ pg_atomic_uint64 processed; /* number tuples inserted by all the workers. */
} FixedParallelExecutorState;
/*
@@ -600,6 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +717,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (planstate->intoclause != NULL &&
+ planstate->intoclause->type == T_IntoClause)
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +743,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr != NULL &&
+ planstate->objectid != InvalidOid)
+ fpes->objectid = planstate->objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +781,17 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
+
/* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr == NULL)
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1419,28 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr != NULL)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use
+ * the proper dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1519,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader
+ * will use it to inform to the end client.
+ */
+ if (intoclausestr != NULL)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..3c41094e0a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,7 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
* ExecInitGather
@@ -131,6 +131,66 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /*
+ * By now, for parallel workers (if launched any),
+ * would have started their work i.e. insertion to
+ * target table.
+ * In case the leader is chosen to participate for
+ * parallel inserts in CTAS, then finish it's share
+ * before going to wait for the parallel workers to
+ * finish.
+ */
+ if (node->need_to_scan_locally == true &&
+ node->ps.dest != NULL &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if (!TupIsNull(outerTupleSlot))
+ {
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+ node->ps.state->es_processed++;
+ }
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+ }
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+ /*
+ * Add up the total tuples inserted by all workers, to
+ * the tuples inserted by the leader(if any). This will
+ * be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -166,6 +226,17 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (node->ps.intoclause != NULL &&
+ node->ps.intoclause->type == T_IntoClause)
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -190,13 +261,17 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!(node->ps.intoclause != NULL &&
+ node->ps.intoclause->type == T_IntoClause))
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -220,6 +295,12 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (node->ps.intoclause != NULL &&
+ node->ps.intoclause->type == T_IntoClause)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index cd3716d494..9ed671415b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -377,6 +377,18 @@ cost_gather(GatherPath *path, PlannerInfo *root,
else
path->path.rows = rel->rows;
+ /*
+ * Make the number of tuples that are transferred from
+ * workers to gather/leader node zero as each worker
+ * parallelly insert the tuples that are resulted from
+ * it's chunk of plan execution. This change may make
+ * the parallel plan cheap among all other plans, and
+ * influence the planner to consdier this parallel plan.
+ */
+ if (root->parse->isForCTAS &&
+ root->query_level == 1)
+ path->path.rows = 0;
+
startup_cost = path->subpath->startup_cost;
run_cost = path->subpath->total_cost - path->subpath->startup_cost;
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a932..1db154da63 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -388,6 +388,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..d287c6b898 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* true if parallelism is to be considered */
+ bool is_parallel_worker; /* true for parallel worker */
+ Oid object_id; /* used for table open by parallel worker */
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +45,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *intoClause, PlannedStmt *plannedstmt);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..77f69946bf 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,7 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ volatile pg_atomic_uint64 *processed; /* number of tuples inserted by all workers */
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a926ff1711..92eb1b34f0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1009,6 +1010,10 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Below is parallel inserts in CTAS related info. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 60c2f45466..fe43dc941e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* true if the select query is for create table as */
} Query;
--
2.25.1
On Wed, Oct 14, 2020 at 2:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
While skimming through the patch, a small thing I noticed: + /* + * SELECT part of the CTAS is parallelizable, so we can make + * each parallel worker insert the tuples that are resulted + * in it's execution into the target table. + */ + if (!is_matview && + IsA(plan->planTree, Gather)) + ((DR_intorel *) dest)->is_parallel = true; +I am not sure at this stage if this is the best way to make CTAS as
parallel but if so, then probably you can expand the comments a bit to
say why you consider only Gather node (and that too when it is the
top-most node) and why not another parallel node like GatherMerge?If somebody expects to preserve the order of the tuples that are
coming from GatherMerge node of the select part in CTAS or SELECT INTO
while inserting, now if parallelism is allowed, that may not be the
case i.e. the order of insertion of tuples may vary. I'm not quite
sure, if someone wants to use order by in the select parts of CTAS or
SELECT INTO in a real world use case. Thoughts?
I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.
Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
For prepared statements, the parallelism will not be picked and so is
parallel insertion.
Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.
I'm listing the things that are still pending.
1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.
I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?
--
With Regards,
Amit Kapila.
On Wed, Oct 14, 2020 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
If somebody expects to preserve the order of the tuples that are
coming from GatherMerge node of the select part in CTAS or SELECT INTO
while inserting, now if parallelism is allowed, that may not be the
case i.e. the order of insertion of tuples may vary. I'm not quite
sure, if someone wants to use order by in the select parts of CTAS or
SELECT INTO in a real world use case. Thoughts?I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.
Sure, I will add comments in the upcoming patch.
For prepared statements, the parallelism will not be picked and so is
parallel insertion.Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.
Thanks for letting me know this. I misunderstood the parallelism for
prepared statements. Now, I verified with a proper use case(see below),
where I had a prepared statement, CTAS having EXECUTE, in this case too
parallelism is picked and parallel insertion happened with the patch
proposed in this thread. Do we have any problems if we allow parallel
insertion for these cases?
PREPARE myselect AS SELECT * FROM t1;
EXPLAIN ANALYZE CREATE TABLE t1_test AS EXECUTE myselect;
I think the commit 57a6a72b6b has not added any test cases, isn't it good
to add one in prepare.sql or select_parallel.sql?
1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?
I thought we could have something like this.
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2 *Parallel Insert on t_test1*
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954
width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 15, 2020 at 9:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Oct 14, 2020 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
For prepared statements, the parallelism will not be picked and so is
parallel insertion.Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.Thanks for letting me know this. I misunderstood the parallelism for prepared statements. Now, I verified with a proper use case(see below), where I had a prepared statement, CTAS having EXECUTE, in this case too parallelism is picked and parallel insertion happened with the patch proposed in this thread. Do we have any problems if we allow parallel insertion for these cases?
PREPARE myselect AS SELECT * FROM t1;
EXPLAIN ANALYZE CREATE TABLE t1_test AS EXECUTE myselect;I think the commit 57a6a72b6b has not added any test cases, isn't it good to add one in prepare.sql or select_parallel.sql?
I am not sure if it is worth as this is not functionality which is too
complex or there are many chances of getting it broken.
1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?I thought we could have something like this.
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2 Parallel Insert on t_test1
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954 width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------
maybe something like below:
Gather (cost=1000.00..108738.90 rows=0 width=8)
-> Create t_test1
-> Parallel Seq Scan on t_test
I don't know what is the best thing to do here. I think for the
temporary purpose you can keep something like above then once the
patch is matured then we can take a separate opinion for this.
--
With Regards,
Amit Kapila.
On 14.10.20 11:16, Bharath Rupireddy wrote:
On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yes we do a bunch of catalog changes related to the created new table.
We will have both the txn id and command id assigned when catalogue
changes are being made. But, right after the table is created in the
leader, the command id is incremented (CommandCounterIncrement() is
called from create_ctas_internal()) whereas the txn id remains the
same. The new command id is marked as GetCurrentCommandId(true); in
intorel_startup, then the parallel mode is entered. The txn id and
command id are serialized into parallel DSM, they are then available
to all parallel workers. This is discussed in [1].Few changes I have to make in the parallel worker code: set
currentCommandIdUsed = true;, may be via a common API
SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
extra command id sharing from the leader to workers.I will add a few comments in the upcoming patch related to the above info.
Yes, that would be good.
Added comments.
But how does that work for SELECT INTO? Are you prohibiting
that? ...In case of SELECT INTO, a new table gets created and I'm not
prohibiting the parallel inserts and I think we don't need to.So, in this case, also do we ensure that table is created before we
launch the workers. If so, I think you can explain in comments about
it and what you need to do that to ensure the same.For SELECT INTO, the table gets created by the leader in
create_ctas_internal(), then ExecInitParallelPlan() gets called which
launches the workers and then the leader(if asked to do so) and the
workers insert the rows. So, we don't need to do any extra work to
ensure the table gets created before the workers start inserting
tuples.While skimming through the patch, a small thing I noticed: + /* + * SELECT part of the CTAS is parallelizable, so we can make + * each parallel worker insert the tuples that are resulted + * in it's execution into the target table. + */ + if (!is_matview && + IsA(plan->planTree, Gather)) + ((DR_intorel *) dest)->is_parallel = true; +I am not sure at this stage if this is the best way to make CTAS as
parallel but if so, then probably you can expand the comments a bit to
say why you consider only Gather node (and that too when it is the
top-most node) and why not another parallel node like GatherMerge?If somebody expects to preserve the order of the tuples that are
coming from GatherMerge node of the select part in CTAS or SELECT INTO
while inserting, now if parallelism is allowed, that may not be the
case i.e. the order of insertion of tuples may vary. I'm not quite
sure, if someone wants to use order by in the select parts of CTAS or
SELECT INTO in a real world use case. Thoughts?Right, for now, I think you can simply remove that check from the code
instead of just commenting it. We will see if there is a better
check/Assert we can add there.Done.
I also worked on some of the open points I listed earlier in my mail.
3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access to those tables.
Done.
Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
For prepared statements, the parallelism will not be picked and so is
parallel insertion.
For CTAS with no data option case the select part is not even planned,
and so the parallelism will also not be picked.4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
Done.
I'm listing the things that are still pending.
1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.
2. Addition of new test cases. Testing with more scenarios and
different data sets, sizes, tablespaces, select into. Analysis on the
2 mismatches in write_parallel.sql regression test.Attaching v2 patch, thoughts and comments are welcome.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Really looking forward to this ending up in postgres as I think it's a
very nice improvement.
Whilst reviewing your patch I was wondering: is there a reason you did
not introduce a batch insert in the destreceiver for the CTAS? For me
this makes a huge difference in ingest speed as otherwise the inserts do
not really scale so well as lock contention start to be a big problem.
If you like I can make a patch to introduce this on top?
Kind regards,
Luc
Swarm64
On Fri, Oct 16, 2020 at 11:33 AM Luc Vlaming <luc@swarm64.com> wrote:
Really looking forward to this ending up in postgres as I think it's a
very nice improvement.Whilst reviewing your patch I was wondering: is there a reason you did
not introduce a batch insert in the destreceiver for the CTAS? For me
this makes a huge difference in ingest speed as otherwise the inserts do
not really scale so well as lock contention start to be a big problem.
If you like I can make a patch to introduce this on top?
Thanks for your interest. You are right, we can get maximum
improvement if we have multi inserts in destreceiver for the CTAS on
the similar lines to COPY FROM command. I specified this point in my
first mail [1]/messages/by-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf+0-Jg+KYT7ZO-Ug@mail.gmail.com. You may want to take a look at an already existing
patch [2]/messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com for multi inserts, I think there are some review comments to
be addressed in that patch. I would love to see the multi insert patch
getting revived.
[1]: /messages/by-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf+0-Jg+KYT7ZO-Ug@mail.gmail.com
[2]: /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 16.10.20 08:23, Bharath Rupireddy wrote:
On Fri, Oct 16, 2020 at 11:33 AM Luc Vlaming <luc@swarm64.com> wrote:
Really looking forward to this ending up in postgres as I think it's a
very nice improvement.Whilst reviewing your patch I was wondering: is there a reason you did
not introduce a batch insert in the destreceiver for the CTAS? For me
this makes a huge difference in ingest speed as otherwise the inserts do
not really scale so well as lock contention start to be a big problem.
If you like I can make a patch to introduce this on top?Thanks for your interest. You are right, we can get maximum
improvement if we have multi inserts in destreceiver for the CTAS on
the similar lines to COPY FROM command. I specified this point in my
first mail [1]. You may want to take a look at an already existing
patch [2] for multi inserts, I think there are some review comments to
be addressed in that patch. I would love to see the multi insert patch
getting revived.[1] - /messages/by-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf+0-Jg+KYT7ZO-Ug@mail.gmail.com
[2] - /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.comWith Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Sorry had not seen that pointer in your first email.
I'll first finish some other patches I'm working on and then I'll try to
revive that patch. Thanks for the pointers.
Kind regards,
Luc
Swarm64
On Thu, Oct 15, 2020 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. How to represent the parallel insert for CTAS in explain plans?
The
explain CTAS shows the plan for only the SELECT part. How about
having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?I thought we could have something like this.
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2 Parallel Insert on t_test1
-> Parallel Seq Scan on t_test (cost=0.00..106748.00
rows=4954 width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------
maybe something like below:
Gather (cost=1000.00..108738.90 rows=0 width=8)
-> Create t_test1
-> Parallel Seq Scan on t_testI don't know what is the best thing to do here. I think for the
temporary purpose you can keep something like above then once the
patch is matured then we can take a separate opinion for this.
Agreed. Here's a snapshot of explain with the change suggested.
postgres=# EXPLAIN (ANALYZE, COSTS OFF) CREATE TABLE t1_test AS SELECT *
FROM t1;
QUERY PLAN
---------------------------------------------------------------------------------
Gather (actual time=970.524..972.913 rows=0 loops=1)
* -> Create t1_test*
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333
loops=3)
Planning Time: 0.049 ms
Execution Time: 973.733 ms
I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.
Added comments.
2. Addition of new test cases.
Added new test cases.
Analysis on the 2 mismatches in write_parallel.sql regression test.
Done. It needed a small code change in costsize.c. Now, both make check and
make check-world passes.
Apart from above, a couple of other things I have finished with the v3
patch.
1. Both make check and make check-world with force_parallel_mode = regress
passes.
2. I enabled parallel inserts in case of materialized views. Hope that's
fine.
Attaching v3 patch herewith.
I'm done with all the open points in my list. Please review the v3 patch
and provide comments.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v3-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From b80460c0390317cceecd66ce9780feafd55bd5b2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 19 Oct 2020 22:06:41 +0530
Subject: [PATCH v3] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 341 ++++++++++++-------
src/backend/commands/explain.c | 36 ++
src/backend/executor/execMain.c | 19 ++
src/backend/executor/execParallel.c | 60 +++-
src/backend/executor/nodeGather.c | 101 +++++-
src/backend/optimizer/path/costsize.c | 12 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 20 ++
src/include/executor/execParallel.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 143 ++++++++
src/test/regress/sql/write_parallel.sql | 65 ++++
15 files changed, 694 insertions(+), 152 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861a02..1602525d4a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..809774c4bb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+*/
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d53ec952d0..9df8a7face 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,10 +316,27 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plan))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
@@ -418,6 +423,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,135 +438,180 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- char relkind;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- RangeTblEntry *rte;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
- relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in inintorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ char relkind;
+ List *attrList;
+ Relation intoRelationDesc;
+ RangeTblEntry *rte;
+ ListCell *lc;
+ int attnum;
- if (lc)
- {
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
- }
- else
- colname = NameStr(attribute->attname);
+ Assert(into != NULL); /* else somebody forgot to set it */
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
+ is_matview = (into->viewQuery != NULL);
+ relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
- */
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ * Build column definitions using "pre-cooked" type and collation info. If
+ * a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK, too
+ * many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ {
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
- attrList = lappend(attrList, col);
- }
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must check
+ * this here because DefineRelation would adopt the type's default
+ * collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ attrList = lappend(attrList, col);
+ }
- /*
- * Check INSERT permission on the constructed table.
- *
- * XXX: It would arguably make sense to skip this check if into->skipData
- * is true.
- */
- rte = makeNode(RangeTblEntry);
- rte->rtekind = RTE_RELATION;
- rte->relid = intoRelationAddr.objectId;
- rte->relkind = relkind;
- rte->rellockmode = RowExclusiveLock;
- rte->requiredPerms = ACL_INSERT;
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
- for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
- rte->insertedCols = bms_add_member(rte->insertedCols,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ /*
+ * Actually create the target table
+ */
+ intoRelationAddr = create_ctas_internal(attrList, into);
- ExecCheckRTPerms(list_make1(rte), true);
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * Check INSERT permission on the constructed table.
+ *
+ * XXX: It would arguably make sense to skip this check if into->skipData
+ * is true.
+ */
+ rte = makeNode(RangeTblEntry);
+ rte->rtekind = RTE_RELATION;
+ rte->relid = intoRelationAddr.objectId;
+ rte->relkind = relkind;
+ rte->rellockmode = RowExclusiveLock;
+ rte->requiredPerms = ACL_INSERT;
+
+ for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
+ rte->insertedCols = bms_add_member(rte->insertedCols,
+ attnum - FirstLowInvalidHeapAttributeNumber);
+
+ ExecCheckRTPerms(list_make1(rte), true);
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has requested
+ * something invalid, and otherwise will return RLS_ENABLED if RLS should
+ * be enabled here. We don't actually support that currently, so throw
+ * our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
- myState->bistate = GetBulkInsertState();
+ /*
+ * Tentatively mark the target as populated, if it's a matview and we're
+ * going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Valid smgr_targblock implies something already wrote to the relation.
+ * This may be harmless, but this function hasn't planned for it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+
+ if (myState->is_parallel == true)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+ }
}
/*
@@ -614,3 +667,49 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, PlannedStmt *plannedstmt)
+{
+ bool allowed = false;
+
+ if (into != NULL &&
+ IsA(into, IntoClause))
+ {
+ if (into->rel != NULL &&
+ into->rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (plannedstmt != NULL && allowed)
+ {
+ /*
+ * We allow parallel inserts by the workers only if the upper node
+ * is Gather. We can not let workers do parallel inserts when
+ * GatherMerge node is involved as the leader backend does the
+ * final phase(merge the results by workers).
+ */
+ if (plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree != NULL &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree != NULL &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have accounted this for cost
+ * calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;
+ }
+ else
+ allowed = false;
+ }
+ }
+
+ return allowed;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1837..b559d2d6e1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -536,7 +545,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* AS, we'd better use the appropriate tuple receiver.
*/
if (into)
+ {
dest = CreateIntoRelDestReceiver(into);
+
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plannedstmt))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
else
dest = None_Receiver;
@@ -1753,6 +1772,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
Gather *gather = (Gather *) plan;
+ if (IsA(planstate, GatherState) &&
+ planstate->intoclause != NULL &&
+ IsA(planstate->intoclause,IntoClause) &&
+ planstate->dest != NULL &&
+ planstate->dest->mydest == DestIntoRel &&
+ ((DR_intorel *) planstate->dest)->is_parallel == true &&
+ planstate->intoclause->rel != NULL &&
+ planstate->intoclause->rel->relname != NULL)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n", planstate->intoclause->rel->relname);
+ es->indent++;
+ ExplainIndentText(es);
+ }
+
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index aea0479448..5482ba4e3a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,24 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each
+ * parallel worker inerst it's tuples, we must send
+ * information such as intoclause(for each worker
+ * building it's own dest receiver), object id(for each
+ * worker to open the table).
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded == true &&
+ dest != NULL &&
+ dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel == true &&
+ ((DR_intorel *) dest)->is_parallel_worker != true)
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..b2aa06102f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,8 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ pg_atomic_uint64 processed; /* number tuples inserted by all the workers. */
} FixedParallelExecutorState;
/*
@@ -600,6 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +717,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (ISCTAS(planstate->intoclause))
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +742,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr != NULL &&
+ planstate->objectid != InvalidOid)
+ fpes->objectid = planstate->objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +780,17 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
+
/* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr == NULL)
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1418,28 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr != NULL)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use
+ * the proper dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1518,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader
+ * will use it to inform to the end client.
+ */
+ if (intoclausestr != NULL)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..99211e4941 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,7 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
* ExecInitGather
@@ -131,6 +132,69 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0 &&
+ !node->need_to_scan_locally)
+ node->need_to_scan_locally = true;
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish it's share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally == true &&
+ node->ps.dest != NULL &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if (!TupIsNull(outerTupleSlot))
+ {
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+ node->ps.state->es_processed++;
+ }
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -166,6 +230,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -190,13 +264,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!(ISCTAS(node->ps.intoclause)))
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,7 +285,8 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !(ISCTAS(node->ps.intoclause)))
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
}
@@ -220,6 +298,11 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (ISCTAS(node->ps.intoclause))
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 733f7ea543..72307e0927 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,17 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..af72ffcfe2 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,30 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* true if parallelism is to be considered */
+ bool is_parallel_worker; /* true for parallel worker */
+ Oid object_id; /* used for table open by parallel worker */
+} DR_intorel;
+
+#define ISCTAS(intoclause) (intoclause != NULL && IsA(intoclause, IntoClause))
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +47,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *intoClause, PlannedStmt *plannedstmt);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..77f69946bf 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,7 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ volatile pg_atomic_uint64 *processed; /* number of tuples inserted by all workers */
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d68d6..5083c4cfb5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1020,6 +1021,10 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Below is parallel inserts in CTAS related info. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 60c2f45466..fe43dc941e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* true if the select query is for create table as */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..11ef18b8a4 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,147 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_mat_view
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..dd4233b399 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,69 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
rollback;
--
2.25.1
On Mon, Oct 19, 2020 at 10:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Attaching v3 patch herewith.
I'm done with all the open points in my list. Please review the v3 patch and provide comments.
Attaching v4 patch, rebased on the latest master 68b1a4877e. Also,
added this feature to commitfest -
https://commitfest.postgresql.org/31/2841/
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v4-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v4-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 127309ffab82e372b338914ccc900463d8c63157 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 23 Nov 2020 16:27:45 +0530
Subject: [PATCH v4] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 314 ++++++++++++-------
src/backend/commands/explain.c | 36 +++
src/backend/executor/execMain.c | 19 ++
src/backend/executor/execParallel.c | 60 +++-
src/backend/executor/nodeGather.c | 101 +++++-
src/backend/optimizer/path/costsize.c | 12 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 20 ++
src/include/executor/execParallel.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 143 +++++++++
src/test/regress/sql/write_parallel.sql | 65 ++++
15 files changed, 682 insertions(+), 137 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..c85a9906f1 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+*/
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..d5fdb8ea07 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,10 +316,27 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plan))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
@@ -418,6 +423,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +438,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in inintorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel == true)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +662,49 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, PlannedStmt *plannedstmt)
+{
+ bool allowed = false;
+
+ if (into != NULL &&
+ IsA(into, IntoClause))
+ {
+ if (into->rel != NULL &&
+ into->rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (plannedstmt != NULL && allowed)
+ {
+ /*
+ * We allow parallel inserts by the workers only if the upper node
+ * is Gather. We can not let workers do parallel inserts when
+ * GatherMerge node is involved as the leader backend does the
+ * final phase(merge the results by workers).
+ */
+ if (plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree != NULL &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree != NULL &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have accounted this for cost
+ * calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;
+ }
+ else
+ allowed = false;
+ }
+ }
+
+ return allowed;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..bb01c8fc53 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -536,7 +545,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* AS, we'd better use the appropriate tuple receiver.
*/
if (into)
+ {
dest = CreateIntoRelDestReceiver(into);
+
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plannedstmt))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
else
dest = None_Receiver;
@@ -1753,6 +1772,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
Gather *gather = (Gather *) plan;
+ if (IsA(planstate, GatherState) &&
+ planstate->intoclause != NULL &&
+ IsA(planstate->intoclause,IntoClause) &&
+ planstate->dest != NULL &&
+ planstate->dest->mydest == DestIntoRel &&
+ ((DR_intorel *) planstate->dest)->is_parallel == true &&
+ planstate->intoclause->rel != NULL &&
+ planstate->intoclause->rel->relname != NULL)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n", planstate->intoclause->rel->relname);
+ es->indent++;
+ ExplainIndentText(es);
+ }
+
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f589f9..ed99725779 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,24 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each
+ * parallel worker inerst it's tuples, we must send
+ * information such as intoclause(for each worker
+ * building it's own dest receiver), object id(for each
+ * worker to open the table).
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded == true &&
+ dest != NULL &&
+ dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel == true &&
+ ((DR_intorel *) dest)->is_parallel_worker != true)
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..b2aa06102f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,8 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ pg_atomic_uint64 processed; /* number tuples inserted by all the workers. */
} FixedParallelExecutorState;
/*
@@ -600,6 +604,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +717,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (ISCTAS(planstate->intoclause))
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +742,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr != NULL &&
+ planstate->objectid != InvalidOid)
+ fpes->objectid = planstate->objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +780,17 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
+
/* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr == NULL)
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1418,28 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr != NULL)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use
+ * the proper dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1518,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader
+ * will use it to inform to the end client.
+ */
+ if (intoclausestr != NULL)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..99211e4941 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,7 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
* ExecInitGather
@@ -131,6 +132,69 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0 &&
+ !node->need_to_scan_locally)
+ node->need_to_scan_locally = true;
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish it's share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally == true &&
+ node->ps.dest != NULL &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if (!TupIsNull(outerTupleSlot))
+ {
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+ node->ps.state->es_processed++;
+ }
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -166,6 +230,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -190,13 +264,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!(ISCTAS(node->ps.intoclause)))
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,7 +285,8 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !(ISCTAS(node->ps.intoclause)))
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
}
@@ -220,6 +298,11 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (ISCTAS(node->ps.intoclause))
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f1dfdc1a4a..d0111c8a5c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,17 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..af72ffcfe2 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,30 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* true if parallelism is to be considered */
+ bool is_parallel_worker; /* true for parallel worker */
+ Oid object_id; /* used for table open by parallel worker */
+} DR_intorel;
+
+#define ISCTAS(intoclause) (intoclause != NULL && IsA(intoclause, IntoClause))
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +47,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *intoClause, PlannedStmt *plannedstmt);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..77f69946bf 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,7 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ volatile pg_atomic_uint64 *processed; /* number of tuples inserted by all workers */
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f6824bf2e1..c7ce365438 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1020,6 +1021,10 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Below is parallel inserts in CTAS related info. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d1f9ef29ca..909313466d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* true if the select query is for create table as */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..11ef18b8a4 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,147 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_mat_view
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..dd4233b399 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,69 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
rollback;
--
2.25.1
Hi,
I'm very interested in this feature,
and I'm looking at the patch, here are some comments.
1.
+ if (!TupIsNull(outerTupleSlot))
+ {
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+ node->ps.state->es_processed++;
+ }
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+ }
How about the following style:
if(TupIsNull(outerTupleSlot))
Break;
(void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
node->ps.state->es_processed++;
Which looks cleaner.
2.
+
+ if (into != NULL &&
+ IsA(into, IntoClause))
+ {
The check can be replaced by ISCTAS(into).
3.
+ /*
+ * For parallelizing inserts in CTAS i.e. making each
+ * parallel worker inerst it's tuples, we must send
+ * information such as intoclause(for each worker
'inerst' looks like a typo (insert).
4.
+ /* Estimate space for into clause for CTAS. */
+ if (ISCTAS(planstate->intoclause))
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
...
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.
So how about the following style:
intoclause_len = strlen(intoclausestr);
...
/* Store serialized intoclause. */
intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
memcpy(shmptr, intoclausestr, intoclause_len + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
the code in ExecInitParallelPlan
5.
+ if (intoclausestr != NULL)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(intoclausestr) + 1);
+ strcpy(shmptr, intoclausestr);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+ }
+
/* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr == NULL)
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
The two check about intoclausestr seems can be combined like:
if (intoclausestr != NULL)
{
...
}
else
{
...
}
Best regards,
houzj
On Tue, Nov 24, 2020 at 4:43 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm very interested in this feature,
and I'm looking at the patch, here are some comments.
Thanks for the review.
How about the following style:
if(TupIsNull(outerTupleSlot))
Break;(void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
node->ps.state->es_processed++;Which looks cleaner.
Done.
The check can be replaced by ISCTAS(into).
Done.
'inerst' looks like a typo (insert).
Corrected.
The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.So how about the following style:
intoclause_len = strlen(intoclausestr);
...
/* Store serialized intoclause. */
intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
memcpy(shmptr, intoclausestr, intoclause_len + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
Done.
The two check about intoclausestr seems can be combined like:
if (intoclausestr != NULL)
{
...
}
else
{
...
}
Done.
Attaching v5 patch. Please consider it for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v5-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v5-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 21fbbb8d4297e6daa7a7ec696a36327f592089bd Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 25 Nov 2020 04:44:29 +0530
Subject: [PATCH v5] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 313 ++++++++++++-------
src/backend/commands/explain.c | 36 +++
src/backend/executor/execMain.c | 18 ++
src/backend/executor/execParallel.c | 68 +++-
src/backend/executor/nodeGather.c | 99 +++++-
src/backend/optimizer/path/costsize.c | 12 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 22 ++
src/include/executor/execParallel.h | 2 +
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 143 +++++++++
src/test/regress/sql/write_parallel.sql | 65 ++++
15 files changed, 688 insertions(+), 138 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..afab904c6b 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,10 +316,27 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plan))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
@@ -418,6 +423,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +438,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in inintorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel == true)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +662,48 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, PlannedStmt *plannedstmt)
+{
+ bool allowed = false;
+
+ if (ISCTAS(into))
+ {
+ if (into->rel != NULL &&
+ into->rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (plannedstmt != NULL && allowed)
+ {
+ /*
+ * We allow parallel inserts by the workers only if the upper node
+ * is Gather. We can not let workers do parallel inserts when
+ * GatherMerge node is involved as the leader backend does the
+ * final phase(merge the results by workers).
+ */
+ if (plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree != NULL &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree != NULL &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have accounted this for cost
+ * calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;
+ }
+ else
+ allowed = false;
+ }
+ }
+
+ return allowed;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..bb01c8fc53 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -536,7 +545,17 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* AS, we'd better use the appropriate tuple receiver.
*/
if (into)
+ {
dest = CreateIntoRelDestReceiver(into);
+
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table.
+ */
+ if (IsParallelInsertInCTASAllowed(into, plannedstmt))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
else
dest = None_Receiver;
@@ -1753,6 +1772,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
Gather *gather = (Gather *) plan;
+ if (IsA(planstate, GatherState) &&
+ planstate->intoclause != NULL &&
+ IsA(planstate->intoclause,IntoClause) &&
+ planstate->dest != NULL &&
+ planstate->dest->mydest == DestIntoRel &&
+ ((DR_intorel *) planstate->dest)->is_parallel == true &&
+ planstate->intoclause->rel != NULL &&
+ planstate->intoclause->rel->relname != NULL)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n", planstate->intoclause->rel->relname);
+ es->indent++;
+ ExplainIndentText(es);
+ }
+
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f589f9..e4efa3ac76 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,23 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for each
+ * worker to build separate dest receiver), object id(for each worker to
+ * open the table).
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded == true &&
+ dest != NULL &&
+ dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel == true &&
+ ((DR_intorel *) dest)->is_parallel_worker != true)
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..442c633232 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -600,6 +605,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +719,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (ISCTAS(planstate->intoclause))
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +745,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr != NULL &&
+ planstate->objectid != InvalidOid)
+ fpes->objectid = planstate->objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr != NULL)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr != NULL)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform to the end client.
+ */
+ if (intoclausestr != NULL)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..93d0d95704 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,7 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
* ExecInitGather
@@ -131,6 +132,67 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0 &&
+ !node->need_to_scan_locally)
+ node->need_to_scan_locally = true;
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish it's share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally == true &&
+ node->ps.dest != NULL &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -190,13 +262,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!(ISCTAS(node->ps.intoclause)))
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,7 +283,8 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !(ISCTAS(node->ps.intoclause)))
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
}
@@ -220,6 +296,11 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (ISCTAS(node->ps.intoclause))
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..4f03db31c7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,17 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..9271e84e4d 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,31 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker */
+ Oid object_id;
+} DR_intorel;
+
+#define ISCTAS(intoclause) (intoclause != NULL && IsA(intoclause, IntoClause))
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +48,7 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *intoClause,
+ PlannedStmt *plannedstmt);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..e475fbdd35 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,8 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..5277d66150 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1009,6 +1010,10 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Prallel inserts in CTAS related info is specified below. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d1f9ef29ca..65c393743c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* Is the SELECT query for CREATE TABLE AS */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..11ef18b8a4 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,147 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_mat_view
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..dd4233b399 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,69 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized view
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
rollback;
--
2.25.1
Hi,
I have an issue about the following code:
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (ISCTAS(node->ps.intoclause))
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/* If no projection is required, we're done. */
if (node->ps.ps_ProjInfo == NULL)
return slot;
/*
* Form the result tuple using ExecProject(), and return it.
*/
econtext->ecxt_outertuple = slot;
return ExecProject(node->ps.ps_ProjInfo);
It seems the projection will be skipped.
Is this because projection is not required in this case ?
(I'm not very familiar with where the projection will be.)
If projection is not required here, shall we add some comments here?
Best regards,
houzj
On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi,
I have an issue about the following code:
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);+ if (ISCTAS(node->ps.intoclause)) + { + ExecParallelInsertInCTAS(node); + return NULL; + }/* If no projection is required, we're done. */
if (node->ps.ps_ProjInfo == NULL)
return slot;/*
* Form the result tuple using ExecProject(), and return it.
*/
econtext->ecxt_outertuple = slot;
return ExecProject(node->ps.ps_ProjInfo);It seems the projection will be skipped.
Is this because projection is not required in this case ?
(I'm not very familiar with where the projection will be.)
For parallel inserts in CTAS, I don't think we need to project the
tuples being returned from the underlying plan nodes, and also we have
nothing to project from the Gather node further up. The required
projection will happen while the tuples are being returned from the
underlying nodes and the projected tuples are being directly fed to
CTAS's dest receiver intorel_receive(), from there into the created
table. We don't need ExecProject again in ExecParallelInsertInCTAS().
For instance, projection will always be done when the tuple is being
returned from an underlying sequential scan node(see ExecScan() -->
ExecProject() and this is true for both leader and workers. In both
leader and workers, we are just calling CTAS's dest receiver
intorel_receive().
Thoughts?
If projection is not required here, shall we add some comments here?
If the above point looks okay, I can add a comment.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi ,
On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
wrote:Hi,
I have an issue about the following code:
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);+ if (ISCTAS(node->ps.intoclause)) + { + ExecParallelInsertInCTAS(node); + return NULL; + }/* If no projection is required, we're done. */
if (node->ps.ps_ProjInfo == NULL)
return slot;/*
* Form the result tuple using ExecProject(), and return it.
*/
econtext->ecxt_outertuple = slot;
return ExecProject(node->ps.ps_ProjInfo);It seems the projection will be skipped.
Is this because projection is not required in this case ?
(I'm not very familiar with where the projection will be.)For parallel inserts in CTAS, I don't think we need to project the tuples
being returned from the underlying plan nodes, and also we have nothing
to project from the Gather node further up. The required projection will
happen while the tuples are being returned from the underlying nodes and
the projected tuples are being directly fed to CTAS's dest receiver
intorel_receive(), from there into the created table. We don't need
ExecProject again in ExecParallelInsertInCTAS().For instance, projection will always be done when the tuple is being returned
from an underlying sequential scan node(see ExecScan() -->
ExecProject() and this is true for both leader and workers. In both leader
and workers, we are just calling CTAS's dest receiver intorel_receive().Thoughts?
I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.
In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.
The example:
Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));
postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test where test.i < 2000;
QUERY PLAN
----------------------------------------
Gather
Output: test.i, (SubPlan 1)
Workers Planned: 2
-> Parallel Seq Scan on public.test
Output: test.i
Filter: (test.i < 2000)
SubPlan 1
-> Limit
Output: (test.i)
-> Seq Scan on public.test2
Output: test.i
In this case, projection is necessary,
because the subplan will be executed in projection.
If skipped, the table created will loss some data.
Best regards,
houzj
On Thu, Nov 26, 2020 at 12:15 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.The example:
Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test where test.i < 2000;
QUERY PLAN
----------------------------------------
Gather
Output: test.i, (SubPlan 1)
Workers Planned: 2
-> Parallel Seq Scan on public.test
Output: test.i
Filter: (test.i < 2000)
SubPlan 1
-> Limit
Output: (test.i)
-> Seq Scan on public.test2
Output: test.iIn this case, projection is necessary,
because the subplan will be executed in projection.If skipped, the table created will loss some data.
Thanks a lot for the use case. Yes with the current patch table will
lose data related to the subplan. On analyzing further, I think we can
not allow parallel inserts in the cases when the Gather node has some
projections to do. Because the workers can not perform that
projection. So, having ps_ProjInfo in the Gather node is an indication
for us to disable parallel inserts and only the leader can do the
insertions after the Gather node does the required projections.
Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.The example:
Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1),
generate_series(1,1000,1));postgres=# explain(verbose, costs off) select test.i,(select i from
(select * from test2) as tt limit 1) from test where test.i < 2000;
QUERY PLAN
----------------------------------------
Gather
Output: test.i, (SubPlan 1)
Workers Planned: 2
-> Parallel Seq Scan on public.test
Output: test.i
Filter: (test.i < 2000)
SubPlan 1
-> Limit
Output: (test.i)
-> Seq Scan on public.test2
Output: test.iIn this case, projection is necessary, because the subplan will be
executed in projection.If skipped, the table created will loss some data.
Thanks a lot for the use case. Yes with the current patch table will lose
data related to the subplan. On analyzing further, I think we can not allow
parallel inserts in the cases when the Gather node has some projections
to do. Because the workers can not perform that projection. So, having
ps_ProjInfo in the Gather node is an indication for us to disable parallel
inserts and only the leader can do the insertions after the Gather node
does the required projections.Thoughts?
Agreed.
2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.
Best regards,
houzj
On Fri, Nov 27, 2020 at 11:57 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Thanks a lot for the use case. Yes with the current patch table will lose
data related to the subplan. On analyzing further, I think we can not allow
parallel inserts in the cases when the Gather node has some projections
to do. Because the workers can not perform that projection. So, having
ps_ProjInfo in the Gather node is an indication for us to disable parallel
inserts and only the leader can do the insertions after the Gather node
does the required projections.Thoughts?
Agreed.
Thanks! I will add/modify IsParallelInsertInCTASAllowed() to return
false in this case.
2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;+ /* + * Take the necessary information to be passed to workers for + * parallel inserts in CTAS. + */ + if (ISCTAS(node->ps.intoclause)) + { + node->ps.lefttree->intoclause = node->ps.intoclause; + node->ps.lefttree->objectid = node->ps.objectid; + } + /* Initialize, or re-initialize, shared state needed by workers. */ if (!node->pei) node->pei = ExecInitParallelPlan(node->ps.lefttree,I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.
I am passing the required information from the up to here through
PlanState structure. Since the Gather node's leftree is also a
PlanState structure variable, here I just assigned them to pass that
information to ExecInitParallelPlan().
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 25-11-2020 03:40, Bharath Rupireddy wrote:
On Tue, Nov 24, 2020 at 4:43 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm very interested in this feature,
and I'm looking at the patch, here are some comments.Thanks for the review.
How about the following style:
if(TupIsNull(outerTupleSlot))
Break;(void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
node->ps.state->es_processed++;Which looks cleaner.
Done.
The check can be replaced by ISCTAS(into).
Done.
'inerst' looks like a typo (insert).
Corrected.
The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.So how about the following style:
intoclause_len = strlen(intoclausestr);
...
/* Store serialized intoclause. */
intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
memcpy(shmptr, intoclausestr, intoclause_len + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);Done.
The two check about intoclausestr seems can be combined like:
if (intoclausestr != NULL)
{
...
}
else
{
...
}Done.
Attaching v5 patch. Please consider it for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Disclaimer: I have by no means throughly reviewed all the involved parts
and am probably missing quite a bit of context so if I understood parts
wrong or they have been discussed before then I'm sorry. Most notably
the whole situation about the command-id is still elusive for me and I
can really not judge yet anything related to that.
IMHO The patch makes that we now have the gather do most of the CTAS
work, which seems unwanted. For the non-ctas insert/update case it seems
that a modifytable node exists to actually do the work. What I'm
wondering is if it is maybe not better to introduce a CreateTable node
as well?
This would have several merits:
- the rowcount of that node would be 0 for the parallel case, and
non-zero for the serial case. Then the gather ndoe and the Query struct
don't have to know about CTAS for the most part, removing e.g. the case
distinctions in cost_gather.
- the inserted rows can now be accounted in this new node instead of the
parallel executor state, and this node can also do its own DSM
intializations
- the generation of a partial variants of the CreateTable node can now
be done in the optimizer instead of the ExecCreateTableAs which IMHO is
a more logical place to make these kind of decisions. which then also
makes it potentially play nicer with costs and the like.
- the explain code can now be in its own place instead of part of the
gather node
- IIUC it would allow the removal of the code to only launch parallel
workers if its not CTAS, which IMHO would be quite a big benefit.
Thoughts?
Some small things I noticed while going through the patch:
- Typo for the comment about "inintorel_startup" which should be
intorel_startup
- if (node->nworkers_launched == 0 && !node->need_to_scan_locally)
can be changed into
if (node->nworkers_launched == 0
because either way it'll be true.
Regards,
Luc
Swarm64
On Fri, Nov 27, 2020 at 1:07 PM Luc Vlaming <luc@swarm64.com> wrote:
Disclaimer: I have by no means throughly reviewed all the involved parts
and am probably missing quite a bit of context so if I understood parts
wrong or they have been discussed before then I'm sorry. Most notably
the whole situation about the command-id is still elusive for me and I
can really not judge yet anything related to that.IMHO The patch makes that we now have the gather do most of the CTAS
work, which seems unwanted. For the non-ctas insert/update case it seems
that a modifytable node exists to actually do the work. What I'm
wondering is if it is maybe not better to introduce a CreateTable node
as well?
This would have several merits:
- the rowcount of that node would be 0 for the parallel case, and
non-zero for the serial case. Then the gather ndoe and the Query struct
don't have to know about CTAS for the most part, removing e.g. the case
distinctions in cost_gather.
- the inserted rows can now be accounted in this new node instead of the
parallel executor state, and this node can also do its own DSM
intializations
- the generation of a partial variants of the CreateTable node can now
be done in the optimizer instead of the ExecCreateTableAs which IMHO is
a more logical place to make these kind of decisions. which then also
makes it potentially play nicer with costs and the like.
- the explain code can now be in its own place instead of part of the
gather node
- IIUC it would allow the removal of the code to only launch parallel
workers if its not CTAS, which IMHO would be quite a big benefit.Thoughts?
If I'm not wrong, I think currently we have no exec nodes for DDLs.
I'm not sure whether we would like to introduce one for this. And also
note that, both CTAS and CREATE MATERIALIZED VIEW(CMV) are handled
with the same code, so if we have CreateTable as the new node, then do
we also want to have another node or a generic node name?
The main design idea of the patch proposed in this thread is that
pushing the dest receiver down to the workers if the SELECT part of
the CTAS or CMV is parallelizable. And also, for CTAS or CMV we do not
do any planning as such, but the planner is just influenced to take
into consideration that there are no tuples to transfer from the
workers to Gather node which may make the planner choose parallelism
for SELECT part. So, the planner work for CTAS or CMV is very minimal.
I also have the idea of extending this design (if accepted) to REFRESH
MATERIALIZED VIEW after some analysis.
I may be wrong above, other hackers may have better opinions.
Some small things I noticed while going through the patch:
- Typo for the comment about "inintorel_startup" which should be
intorel_startup
Corrected.
- if (node->nworkers_launched == 0 && !node->need_to_scan_locally)
can be changed into
if (node->nworkers_launched == 0
because either way it'll be true.
Yes, !node->need_to_scan_locally is not necessary, we need to set it
to true if there are no workers launched. I removed
!node->need_to_scan_locally check from the if clause.
On Fri, Nov 27, 2020 at 11:57 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Thanks a lot for the use case. Yes with the current patch table will lose
data related to the subplan. On analyzing further, I think we can not allow
parallel inserts in the cases when the Gather node has some projections
to do. Because the workers can not perform that projection. So, having
ps_ProjInfo in the Gather node is an indication for us to disable parallel
inserts and only the leader can do the insertions after the Gather node
does the required projections.Thoughts?
Agreed.
Thanks! I will add/modify IsParallelInsertInCTASAllowed() to return
false in this case.
Modified.
Attaching v6 patch that has the above review comments addressed.
Please review it further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v6-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v6-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From b5f111c7dcc7efa336aa5da59b6bdbd3689a70e5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 30 Nov 2020 09:36:03 +0530
Subject: [PATCH v6] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each
worker writes atomically it's number of inserted tuples into a
shared memory variable, the leader combines this with it's own
number of inserted tuples and shares to the client.s
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 324 +++++++++++++------
src/backend/commands/explain.c | 39 +++
src/backend/executor/execMain.c | 17 +
src/backend/executor/execParallel.c | 67 +++-
src/backend/executor/nodeGather.c | 97 +++++-
src/backend/optimizer/path/costsize.c | 12 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 23 ++
src/include/executor/execParallel.h | 2 +
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 190 +++++++++++
src/test/regress/sql/write_parallel.sql | 84 +++++
15 files changed, 765 insertions(+), 138 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..c76790f776 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table. We need plan state to be
+ * initialized by the executor to decide whether to allow parallel
+ * inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ ((DR_intorel *) dest)->is_parallel = true;
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +425,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +440,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +664,57 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ bool allowed = false;
+
+ if (ISCTAS(into))
+ {
+ RangeVar *rel = into->rel;
+
+ /* Allow parallel inserts only if the table is not temporary. */
+ if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (queryDesc && allowed)
+ {
+ PlanState *ps = queryDesc->planstate;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node
+ * has no projections to perform and if the upper node is Gather.
+ * In case, the Gather node has projections, which is possible if
+ * there are any subplans in the query, the workers can not do
+ * those projections. And when the upper node is GatherMerge, then
+ * the leader has to perform the final phase i.e. merge the results
+ * by workers.
+ */
+ if (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have already accounted this for
+ * cost calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;
+ }
+ else
+ allowed = false;
+ }
+ }
+
+ return allowed;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..121c118bce 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will parallelly insert its share of tuples.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -556,6 +565,19 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make each
+ * parallel worker insert the tuples that are resulted in it's
+ * execution into the target table. We need plan state to be
+ * initialized by the executor to decide whether to allow parallel
+ * inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ ((DR_intorel *) dest)->is_parallel = true;
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1753,6 +1775,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
Gather *gather = (Gather *) plan;
+ if (IsA(planstate, GatherState) &&
+ planstate->intoclause &&
+ IsA(planstate->intoclause,IntoClause) &&
+ planstate->dest &&
+ planstate->dest->mydest == DestIntoRel &&
+ ((DR_intorel *) planstate->dest)->is_parallel &&
+ planstate->intoclause->rel &&
+ planstate->intoclause->rel->relname)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n", planstate->intoclause->rel->relname);
+ es->indent++;
+ ExplainIndentText(es);
+ }
+
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f589f9..c91741981e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_publication.h"
+#include "commands/createas.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
@@ -352,6 +353,22 @@ standard_ExecutorRun(QueryDesc *queryDesc,
if (sendTuples)
dest->rStartup(dest, operation, queryDesc->tupDesc);
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for each
+ * worker to build separate dest receiver), object id(for each worker to
+ * open the table).
+ */
+ if (queryDesc->plannedstmt->parallelModeNeeded &&
+ dest && dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel &&
+ !(((DR_intorel *) dest)->is_parallel_worker))
+ {
+ queryDesc->planstate->intoclause = ((DR_intorel *) dest)->into;
+ queryDesc->planstate->objectid = ((DR_intorel *) dest)->object_id;
+ queryDesc->planstate->dest = dest;
+ }
+
/*
* run plan
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..2ea4664732 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -600,6 +605,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +719,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (ISCTAS(planstate->intoclause))
+ {
+ intoclausestr = nodeToString(planstate->intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +745,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && planstate->objectid != InvalidOid)
+ fpes->objectid = planstate->objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +782,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1421,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1523,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform to the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..0c1283af44 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,7 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
* ExecInitGather
@@ -131,6 +132,65 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish it's share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally && node->ps.dest &&
+ node->ps.dest->mydest == DestIntoRel)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ /* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -166,6 +226,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
@@ -190,13 +260,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!(ISCTAS(node->ps.intoclause)))
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,7 +281,8 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !(ISCTAS(node->ps.intoclause)))
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
}
@@ -220,6 +294,11 @@ ExecGather(PlanState *pstate)
econtext = node->ps.ps_ExprContext;
ResetExprContext(econtext);
+ if (ISCTAS(node->ps.intoclause))
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
/*
* Get next tuple, either from one of our workers, or by running the plan
* ourselves.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..4f03db31c7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,17 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..56aba06d6a 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,32 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define ISCTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +49,7 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *intoClause,
+ QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9c7d0edd40 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,6 +35,8 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..5277d66150 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -1009,6 +1010,10 @@ typedef struct PlanState
bool outeropsset;
bool inneropsset;
bool resultopsset;
+ /* Prallel inserts in CTAS related info is specified below. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d1f9ef29ca..c15b01252c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* is the SELECT query for CREATE TABLE AS */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..a86da449fd 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -75,5 +75,195 @@ explain (costs off) create table parallel_write as execute prep_stmt;
(7 rows)
create table parallel_write as execute prep_stmt;
+drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_mat_view
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ -> Create parallel_write
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
drop table parallel_write;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..e65a56b442 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,88 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
rollback;
--
2.25.1
On Mon, Nov 30, 2020 at 10:43 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Nov 27, 2020 at 1:07 PM Luc Vlaming <luc@swarm64.com> wrote:
Disclaimer: I have by no means throughly reviewed all the involved parts
and am probably missing quite a bit of context so if I understood parts
wrong or they have been discussed before then I'm sorry. Most notably
the whole situation about the command-id is still elusive for me and I
can really not judge yet anything related to that.IMHO The patch makes that we now have the gather do most of the CTAS
work, which seems unwanted. For the non-ctas insert/update case it seems
that a modifytable node exists to actually do the work. What I'm
wondering is if it is maybe not better to introduce a CreateTable node
as well?
This would have several merits:
- the rowcount of that node would be 0 for the parallel case, and
non-zero for the serial case. Then the gather ndoe and the Query struct
don't have to know about CTAS for the most part, removing e.g. the case
distinctions in cost_gather.
- the inserted rows can now be accounted in this new node instead of the
parallel executor state, and this node can also do its own DSM
intializations
- the generation of a partial variants of the CreateTable node can now
be done in the optimizer instead of the ExecCreateTableAs which IMHO is
a more logical place to make these kind of decisions. which then also
makes it potentially play nicer with costs and the like.
- the explain code can now be in its own place instead of part of the
gather node
- IIUC it would allow the removal of the code to only launch parallel
workers if its not CTAS, which IMHO would be quite a big benefit.Thoughts?
If I'm not wrong, I think currently we have no exec nodes for DDLs.
I'm not sure whether we would like to introduce one for this.
Yeah, I am also not in favor of having an executor node for CTAS but
OTOH, I also don't like the way you have jammed the relevant
information in generic PlanState. How about keeping it in GatherState
and initializing it in ExecCreateTableAs() after the executor start.
You are already doing special treatment for the Gather node in
ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
well initialize the required information in GatherState in
ExecCreateTableAs. I think that might help in reducing the special
treatment for intoclause at different places.
Few other assorted comments:
=========================
1.
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
..
..
if (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have already accounted this for
+ * cost calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;
This looks a bit odd. The function name
'IsParallelInsertInCTASAllowed' suggests that it just checks whether
parallelism is allowed but it is internally changing the plan_rows. It
might be better to do this separately if the parallelism is allowed.
2.
static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);
Spurious line removal.
3.
/* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
The comment and code appear a bit misleading as the function seems to
shutdown the workers rather than waiting for them to finish. How about
using something like below:
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
* to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
int i;
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
This is how it works for parallel vacuum.
4.
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
The above comment doesn't seem to convey what it intends to convey.
How about changing it slightly as: "We don't compute the
parallel_tuple_cost for CTAS because the number of tuples that are
transferred from workers to the gather node is zero as each worker
parallelly inserts the tuples that are resulted from its chunk of plan
execution. This change may make the parallel plan cheap among all
other plans, and influence the planner to consider this parallel
plan."
Then, we can also have an Assert for path->path.rows to zero for the CTAS case.
5.
+ /* Prallel inserts in CTAS related info is specified below. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
} PlanState;
Typo. /Prallel/Parallel
6.
Currently, it seems the plan look like:
Gather (actual time=970.524..972.913 rows=0 loops=1)
-> Create t1_test
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
I would prefer it to be:
Gather (actual time=970.524..972.913 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Create t1_test
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
This way it looks like the writing part is done below the Gather node
and also it will match the Parallel Insert patch of Greg.
--
With Regards,
Amit Kapila.
Thanks Amit for the review comments.
On Sat, Dec 5, 2020 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
If I'm not wrong, I think currently we have no exec nodes for DDLs.
I'm not sure whether we would like to introduce one for this.Yeah, I am also not in favor of having an executor node for CTAS but
OTOH, I also don't like the way you have jammed the relevant
information in generic PlanState. How about keeping it in GatherState
and initializing it in ExecCreateTableAs() after the executor start.
You are already doing special treatment for the Gather node in
ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
well initialize the required information in GatherState in
ExecCreateTableAs. I think that might help in reducing the special
treatment for intoclause at different places.
Done. Added required info to GatherState node. While this reduced the
changes at many other places, but had to pass the into clause and
object id to ExecInitParallelPlan() as we do not send GatherState node
to it. Hope that's okay.
Few other assorted comments:
=========================
1.
This looks a bit odd. The function name
'IsParallelInsertInCTASAllowed' suggests that it just checks whether
parallelism is allowed but it is internally changing the plan_rows. It
might be better to do this separately if the parallelism is allowed.
Changed.
2. static void ExecShutdownGatherWorkers(GatherState *node); - +static void ExecParallelInsertInCTAS(GatherState *node);Spurious line removal.
Corrected.
3.
The comment and code appear a bit misleading as the function seems to
shutdown the workers rather than waiting for them to finish. How about
using something like below:/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
* to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
int i;/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}This is how it works for parallel vacuum.
Done.
4.
The above comment doesn't seem to convey what it intends to convey.
How about changing it slightly as: "We don't compute the
parallel_tuple_cost for CTAS because the number of tuples that are
transferred from workers to the gather node is zero as each worker
parallelly inserts the tuples that are resulted from its chunk of plan
execution. This change may make the parallel plan cheap among all
other plans, and influence the planner to consider this parallel
plan."
Changed.
Then, we can also have an Assert for path->path.rows to zero for the CTAS case.
We can not have Assert(path->path.rows == 0), because we are not
changing this parameter upstream in or before the planning phase. We
are just skipping to take it into account for CTAS. We may have to do
extra checks over different places in case we have to make planner
path->path.rows to 0 for CTAS. IMHO, that's not necessary. We can just
skip taking this value in cost_gather. Thoughts?
5. + /* Prallel inserts in CTAS related info is specified below. */ + IntoClause *intoclause; + Oid objectid; + DestReceiver *dest; } PlanState;Typo. /Prallel/Parallel
Corrected.
6.
Currently, it seems the plan look like:
Gather (actual time=970.524..972.913 rows=0 loops=1)
-> Create t1_test
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)I would prefer it to be:
Gather (actual time=970.524..972.913 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Create t1_test
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)This way it looks like the writing part is done below the Gather node
and also it will match the Parallel Insert patch of Greg.
Done.
Attaching v7 patch. Please review it further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v7-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v7-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 7eff47d971c0ac5722ddc7de5a4c7e7550bb71aa Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 6 Dec 2020 08:21:42 +0530
Subject: [PATCH v7] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 341 +++++++++++++------
src/backend/commands/explain.c | 41 +++
src/backend/executor/execParallel.c | 70 +++-
src/backend/executor/nodeGather.c | 114 ++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/backend/optimizer/path/costsize.c | 13 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 28 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 190 +++++++++++
src/test/regress/sql/write_parallel.sql | 84 +++++
15 files changed, 796 insertions(+), 141 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..5d546464dc 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +424,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +439,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +663,75 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ bool allowed = false;
+
+ if (IS_CTAS(into))
+ {
+ RangeVar *rel = into->rel;
+
+ /* Allow parallel inserts only if the table is not temporary. */
+ if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
+
+ if (queryDesc && allowed)
+ {
+ PlanState *ps = queryDesc->planstate;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node
+ * has no projections to perform and if the upper node is Gather.
+ * In case, the Gather node has projections, which is possible if
+ * there are any subplans in the query, the workers can not do
+ * those projections. And when the upper node is GatherMerge, then
+ * the leader has to perform the final phase i.e. merge the results
+ * by workers.
+ */
+ if (!(ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe))
+ {
+ allowed = false;
+ }
+ }
+ }
+
+ return allowed;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the required info for the parallel
+ * inserts, that is required in the plan exection.
+ */
+void SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for
+ * eachworker to build separate dest receiver), object id(for each worker
+ * to open the created table).
+ */
+ ((DR_intorel *) queryDesc->dest)->is_parallel = true;
+ gstate->dest = queryDesc->dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in explain plans. Note that we
+ * would have already accounted this for cost calculations in
+ * cost_gather().
+ */
+ queryDesc->plannedstmt->planTree->plan_rows = 0;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..57875e1505 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -556,6 +565,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1793,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..330978d018 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform to the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..9cef6bdd61 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,73 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ int i;
+
+ /* Wait for the parallel workers to finish. */
+ WaitForParallelWorkersToFinish(node->pei->pcxt);
+
+ for (i = 0; i < node->nworkers_launched; i++)
+ {
+ InstrAccumParallelQuery(&node->pei->buffer_usage[i],
+ &node->pei->wal_usage[i]);
+ }
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +226,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +235,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +254,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +275,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +296,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..225f3cbf01 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,18 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * We do not compute the parallel_tuple_cost for CTAS because the number of
+ * tuples that are transferred from workers to the gather node is zero as
+ * each worker, in parallel, inserts the tuples that are resulted from its
+ * chunk of plan execution. This change may make the parallel plan cheap
+ * among all other plans, and influence the planner to consider this
+ * parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab3aab58c5 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,9 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
+ QueryDesc *queryDesc);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ec14fc2036..eb267b1a6c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* is the SELECT query for CREATE TABLE AS */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..635d24e76d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -75,5 +75,195 @@ explain (costs off) create table parallel_write as execute prep_stmt;
(7 rows)
create table parallel_write as execute prep_stmt;
+drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
drop table parallel_write;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..e65a56b442 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,88 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
rollback;
--
2.25.1
Hi, Bharath :
+ (void) SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
SetCurrentCommandIdUsedForWorker already has void as return type. The
'(void)' is not needed.
+ * rd_createSubid is marked invalid, otherwise, the table is
+ * not allowed to extend by the workers.
nit: to extend by the workers -> to be extended by the workers
For IsParallelInsertInCTASAllowed, logic is inside 'if (IS_CTAS(into))'
block.
You can return false when (!IS_CTAS(into)) - this would save some
indentation for the body.
+ if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
+ allowed = true;
Similarly, when the above condition doesn't hold, you can return false
directly - reducing the next if condition to 'if (queryDesc)'.
+ if (!(ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe))
The composite condition is negated. Maybe you can write without negation:
+ return (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ * Write out the number of tuples this worker has inserted. Leader will
use
+ * it to inform to the end client.
'inform to the end client' -> 'inform the end client' (without to)
Cheers
On Sun, Dec 6, 2020 at 4:37 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
Thanks Amit for the review comments.
On Sat, Dec 5, 2020 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:If I'm not wrong, I think currently we have no exec nodes for DDLs.
I'm not sure whether we would like to introduce one for this.Yeah, I am also not in favor of having an executor node for CTAS but
OTOH, I also don't like the way you have jammed the relevant
information in generic PlanState. How about keeping it in GatherState
and initializing it in ExecCreateTableAs() after the executor start.
You are already doing special treatment for the Gather node in
ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
well initialize the required information in GatherState in
ExecCreateTableAs. I think that might help in reducing the special
treatment for intoclause at different places.Done. Added required info to GatherState node. While this reduced the
changes at many other places, but had to pass the into clause and
object id to ExecInitParallelPlan() as we do not send GatherState node
to it. Hope that's okay.Few other assorted comments:
=========================
1.
This looks a bit odd. The function name
'IsParallelInsertInCTASAllowed' suggests that it just checks whether
parallelism is allowed but it is internally changing the plan_rows. It
might be better to do this separately if the parallelism is allowed.Changed.
2. static void ExecShutdownGatherWorkers(GatherState *node); - +static void ExecParallelInsertInCTAS(GatherState *node);Spurious line removal.
Corrected.
3.
The comment and code appear a bit misleading as the function seems to
shutdown the workers rather than waiting for them to finish. How about
using something like below:/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
* to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
int i;/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}This is how it works for parallel vacuum.
Done.
4.
The above comment doesn't seem to convey what it intends to convey.
How about changing it slightly as: "We don't compute the
parallel_tuple_cost for CTAS because the number of tuples that are
transferred from workers to the gather node is zero as each worker
parallelly inserts the tuples that are resulted from its chunk of plan
execution. This change may make the parallel plan cheap among all
other plans, and influence the planner to consider this parallel
plan."Changed.
Then, we can also have an Assert for path->path.rows to zero for the
CTAS case.
We can not have Assert(path->path.rows == 0), because we are not
changing this parameter upstream in or before the planning phase. We
are just skipping to take it into account for CTAS. We may have to do
extra checks over different places in case we have to make planner
path->path.rows to 0 for CTAS. IMHO, that's not necessary. We can just
skip taking this value in cost_gather. Thoughts?5. + /* Prallel inserts in CTAS related info is specified below. */ + IntoClause *intoclause; + Oid objectid; + DestReceiver *dest; } PlanState;Typo. /Prallel/Parallel
Corrected.
6.
Currently, it seems the plan look like:
Gather (actual time=970.524..972.913 rows=0 loops=1)
-> Create t1_test
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333loops=3)
I would prefer it to be:
Gather (actual time=970.524..972.913 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Create t1_test
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333loops=3)
This way it looks like the writing part is done below the Gather node
and also it will match the Parallel Insert patch of Greg.Done.
Attaching v7 patch. Please review it further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks for the comments.
On Mon, Dec 7, 2020 at 8:56 AM Zhihong Yu <zyu@yugabyte.com> wrote:
+ (void) SetCurrentCommandIdUsedForWorker(); + myState->output_cid = GetCurrentCommandId(false);SetCurrentCommandIdUsedForWorker already has void as return type. The '(void)' is not needed.
Removed.
+ * rd_createSubid is marked invalid, otherwise, the table is + * not allowed to extend by the workers.nit: to extend by the workers -> to be extended by the workers
Changed.
For IsParallelInsertInCTASAllowed, logic is inside 'if (IS_CTAS(into))' block.
You can return false when (!IS_CTAS(into)) - this would save some indentation for the body.
Done.
+ if (rel && rel->relpersistence != RELPERSISTENCE_TEMP) + allowed = true;Similarly, when the above condition doesn't hold, you can return false directly - reducing the next if condition to 'if (queryDesc)'.
Done.
The composite condition is negated. Maybe you can write without negation:
Done.
+ * Write out the number of tuples this worker has inserted. Leader will use + * it to inform to the end client.'inform to the end client' -> 'inform the end client' (without to)
Changed.
Attaching v8 patch. Consider this for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v8-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v8-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 295ba5ba91c51655c1f92c3821912d6daffd1ab1 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 7 Dec 2020 10:15:56 +0530
Subject: [PATCH v8] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 342 +++++++++++++------
src/backend/commands/explain.c | 41 +++
src/backend/executor/execParallel.c | 70 +++-
src/backend/executor/nodeGather.c | 114 ++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/backend/optimizer/path/costsize.c | 13 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 28 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 190 +++++++++++
src/test/regress/sql/write_parallel.sql | 84 +++++
15 files changed, 797 insertions(+), 141 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..093557cf86 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +424,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +439,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +663,76 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return false;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ if (queryDesc)
+ {
+ PlanState *ps = queryDesc->planstate;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ bool allow;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node has
+ * no projections to perform and if the upper node is Gather. In case,
+ * the Gather node has projections, which is possible if there are any
+ * subplans in the query, the workers can not do those projections. And
+ * when the upper node is GatherMerge, then the leader has to perform
+ * the final phase i.e. merge the results by workers.
+ */
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe;
+
+ return allow;
+ }
+
+ return false;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the required info for the parallel
+ * inserts, that is required in the plan exection.
+ */
+void SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for
+ * eachworker to build separate dest receiver), object id(for each worker
+ * to open the created table).
+ */
+ ((DR_intorel *) queryDesc->dest)->is_parallel = true;
+ gstate->dest = queryDesc->dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in explain plans. Note that we
+ * would have already accounted this for cost calculations in
+ * cost_gather().
+ */
+ queryDesc->plannedstmt->planTree->plan_rows = 0;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..57875e1505 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -556,6 +565,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1793,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..9cef6bdd61 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,73 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ int i;
+
+ /* Wait for the parallel workers to finish. */
+ WaitForParallelWorkersToFinish(node->pei->pcxt);
+
+ for (i = 0; i < node->nworkers_launched; i++)
+ {
+ InstrAccumParallelQuery(&node->pei->buffer_usage[i],
+ &node->pei->wal_usage[i]);
+ }
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +226,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +235,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +254,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +275,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +296,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..225f3cbf01 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,18 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * We do not compute the parallel_tuple_cost for CTAS because the number of
+ * tuples that are transferred from workers to the gather node is zero as
+ * each worker, in parallel, inserts the tuples that are resulted from its
+ * chunk of plan execution. This change may make the parallel plan cheap
+ * among all other plans, and influence the planner to consider this
+ * parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab3aab58c5 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,9 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
+ QueryDesc *queryDesc);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ec14fc2036..eb267b1a6c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* is the SELECT query for CREATE TABLE AS */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..635d24e76d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -75,5 +75,195 @@ explain (costs off) create table parallel_write as execute prep_stmt;
(7 rows)
create table parallel_write as execute prep_stmt;
+drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
drop table parallel_write;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..e65a56b442 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,88 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
rollback;
--
2.25.1
Hi
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+ /*
+ * We do not compute the parallel_tuple_cost for CTAS because the number of
+ * tuples that are transferred from workers to the gather node is zero as
+ * each worker, in parallel, inserts the tuples that are resulted from its
+ * chunk of plan execution. This change may make the parallel plan cheap
+ * among all other plans, and influence the planner to consider this
+ * parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.
Example:
Create table test(i int);
insert into test values(generate_series(1,10000000,1));
explain create table ntest3 as select * from test where i < 200 limit 10000;
QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=1000.00..97331.33 rows=1000 width=4)
-> Gather (cost=1000.00..97331.33 rows=1000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
Filter: (i < 200)
The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.
Is that works as expected ?
Best regards,
houzj
On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi
+ /* + * Flag to let the planner know that the SELECT query is for CTAS. This is + * used to calculate the tuple transfer cost from workers to gather node(in + * case parallelism kicks in for the SELECT part of the CTAS), to zero as + * each worker will insert its share of tuples in parallel. + */ + if (IsParallelInsertInCTASAllowed(into, NULL)) + query->isForCTAS = true;+ /* + * We do not compute the parallel_tuple_cost for CTAS because the number of + * tuples that are transferred from workers to the gather node is zero as + * each worker, in parallel, inserts the tuples that are resulted from its + * chunk of plan execution. This change may make the parallel plan cheap + * among all other plans, and influence the planner to consider this + * parallel plan. + */ + if (!(root->parse->isForCTAS && + root->query_level == 1)) + run_cost += parallel_tuple_cost * path->path.rows;I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.Example:
Create table test(i int);
insert into test values(generate_series(1,10000000,1));
explain create table ntest3 as select * from test where i < 200 limit 10000;QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=1000.00..97331.33 rows=1000 width=4)
-> Gather (cost=1000.00..97331.33 rows=1000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
Filter: (i < 200)The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.Is that works as expected ?
I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi
+ /* + * Flag to let the planner know that the SELECT query is for CTAS. This is + * used to calculate the tuple transfer cost from workers to gather node(in + * case parallelism kicks in for the SELECT part of the CTAS), to zero as + * each worker will insert its share of tuples in parallel. + */ + if (IsParallelInsertInCTASAllowed(into, NULL)) + query->isForCTAS = true;+ /* + * We do not compute the parallel_tuple_cost for CTAS because the number of + * tuples that are transferred from workers to the gather node is zero as + * each worker, in parallel, inserts the tuples that are resulted from its + * chunk of plan execution. This change may make the parallel plan cheap + * among all other plans, and influence the planner to consider this + * parallel plan. + */ + if (!(root->parse->isForCTAS && + root->query_level == 1)) + run_cost += parallel_tuple_cost * path->path.rows;I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.Example:
Create table test(i int);
insert into test values(generate_series(1,10000000,1));
explain create table ntest3 as select * from test where i < 200 limit 10000;QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=1000.00..97331.33 rows=1000 width=4)
-> Gather (cost=1000.00..97331.33 rows=1000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
Filter: (i < 200)The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.Is that works as expected ?
I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.
Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.
Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 7, 2020 at 3:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
+ if (!(root->parse->isForCTAS && + root->query_level == 1)) + run_cost += parallel_tuple_cost * path->path.rows;I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.Example:
Create table test(i int);
insert into test values(generate_series(1,10000000,1));
explain create table ntest3 as select * from test where i < 200 limit 10000;QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=1000.00..97331.33 rows=1000 width=4)
-> Gather (cost=1000.00..97331.33 rows=1000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
Filter: (i < 200)The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.Is that works as expected ?
I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?
isForCTAS is getting set before pg_plan_query() which is being used in
cost_gather(). We will not have a Gather node by then and hence will
not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
setting isForCTAS to true. Intention to check query_level == 1 in
cost_gather is to consider for only top level query not for other sub
queries.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?isForCTAS is getting set before pg_plan_query() which is being used in
cost_gather(). We will not have a Gather node by then and hence will
not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
setting isForCTAS to true.
IsParallelInsertInCTASAllowed() seems to be returning false if
queryDesc is NULL, so won't isForCTAS be always set to false? I think
I am missing something here.
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?isForCTAS is getting set before pg_plan_query() which is being used in
cost_gather(). We will not have a Gather node by then and hence will
not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
setting isForCTAS to true.IsParallelInsertInCTASAllowed() seems to be returning false if
queryDesc is NULL, so won't isForCTAS be always set to false? I think
I am missing something here.
My bad. I utterly missed this, sorry for the confusion.
My intention to have IsParallelInsertInCTASAllowed() is for two
purposes. 1. when called before planning without queryDesc, it should
return true if IS_CTAS(into) is true and is not a temporary table. 2.
when called after planning with a non-null queryDesc, along with 1)
checks, it should also perform the Gather State checks and return
accordingly.
I have corrected it in v9 patch. Please have a look.
The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.Is that works as expected ?
I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks? This is simple
to do, but we might miss some parallel plans for the SELECTs because
the planner would have already considered the tuple transfer cost from
workers to Gather wrongly because of which that parallel plan would
have become costlier compared to non parallel plans. IMO, we can do
this since it also keeps the existing behaviour of the planner i.e.
when the planner is planning for SELECTs it doesn't know that it is
doing it for CTAS. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v9-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v9-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 21f8da6da53ef35ec076202f1f13ee96e01f5de1 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 7 Dec 2020 18:39:13 +0530
Subject: [PATCH v9] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 349 +++++++++++++------
src/backend/commands/explain.c | 41 +++
src/backend/executor/execParallel.c | 70 +++-
src/backend/executor/nodeGather.c | 114 +++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/backend/optimizer/path/costsize.c | 13 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 28 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/parsenodes.h | 1 +
src/test/regress/expected/write_parallel.out | 190 ++++++++++
src/test/regress/sql/write_parallel.sql | 84 +++++
15 files changed, 804 insertions(+), 141 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..3045c0f046 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..c455567a17 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +424,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +439,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +663,83 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return false;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ /*
+ * We intend to call IsParallelInsertInCTASAllowed() before and after
+ * planning with queryDesc NULL and non-NULL/valid respectively. Hence
+ * return true for before planning i.e. queryDesc NULL case when we reach
+ * here. However, for after planning case i.e. queryDesc not NULL/valid,
+ * proceed with Gather node checks and return accordingly.
+ */
+ if (!queryDesc)
+ return true;
+ else
+ {
+ PlanState *ps = queryDesc->planstate;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ bool allow;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node has
+ * no projections to perform and if the upper node is Gather. In case,
+ * the Gather node has projections, which is possible if there are any
+ * subplans in the query, the workers can not do those projections. And
+ * when the upper node is GatherMerge, then the leader has to perform
+ * the final phase i.e. merge the results by workers.
+ */
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe;
+
+ return allow;
+ }
+}
+
+/*
+ * SetCTASParallelInsertState --- set the required info for the parallel
+ * inserts, that is required in the plan exection.
+ */
+void SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for
+ * eachworker to build separate dest receiver), object id(for each worker
+ * to open the created table).
+ */
+ ((DR_intorel *) queryDesc->dest)->is_parallel = true;
+ gstate->dest = queryDesc->dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in explain plans. Note that we
+ * would have already accounted this for cost calculations in
+ * cost_gather().
+ */
+ queryDesc->plannedstmt->planTree->plan_rows = 0;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..57875e1505 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -371,6 +371,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
return;
}
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* if an advisor plugin is present, let it manage things */
if (ExplainOneQuery_hook)
(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
@@ -556,6 +565,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1793,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..9cef6bdd61 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,73 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ int i;
+
+ /* Wait for the parallel workers to finish. */
+ WaitForParallelWorkersToFinish(node->pei->pcxt);
+
+ for (i = 0; i < node->nworkers_launched; i++)
+ {
+ InstrAccumParallelQuery(&node->pei->buffer_usage[i],
+ &node->pei->wal_usage[i]);
+ }
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +226,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +235,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +254,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +275,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +296,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..225f3cbf01 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -393,7 +393,18 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * We do not compute the parallel_tuple_cost for CTAS because the number of
+ * tuples that are transferred from workers to the gather node is zero as
+ * each worker, in parallel, inserts the tuples that are resulted from its
+ * chunk of plan execution. This change may make the parallel plan cheap
+ * among all other plans, and influence the planner to consider this
+ * parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab3aab58c5 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,9 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
+ QueryDesc *queryDesc);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ec14fc2036..eb267b1a6c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ bool isForCTAS; /* is the SELECT query for CREATE TABLE AS */
} Query;
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..635d24e76d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -75,5 +75,195 @@ explain (costs off) create table parallel_write as execute prep_stmt;
(7 rows)
create table parallel_write as execute prep_stmt;
+drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
drop table parallel_write;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..e65a56b442 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,88 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+select length(stringu1) from tenk1;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+select nextval('parallel_write_sequence'), four from tenk1;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+drop table parallel_write;
+
rollback;
--
2.25.1
On Mon, Dec 7, 2020 at 7:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?isForCTAS is getting set before pg_plan_query() which is being used in
cost_gather(). We will not have a Gather node by then and hence will
not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
setting isForCTAS to true.IsParallelInsertInCTASAllowed() seems to be returning false if
queryDesc is NULL, so won't isForCTAS be always set to false? I think
I am missing something here.My bad. I utterly missed this, sorry for the confusion.
My intention to have IsParallelInsertInCTASAllowed() is for two
purposes. 1. when called before planning without queryDesc, it should
return true if IS_CTAS(into) is true and is not a temporary table. 2.
when called after planning with a non-null queryDesc, along with 1)
checks, it should also perform the Gather State checks and return
accordingly.I have corrected it in v9 patch. Please have a look.
The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.Is that works as expected ?
I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks? This is simple
to do, but we might miss some parallel plans for the SELECTs because
the planner would have already considered the tuple transfer cost from
workers to Gather wrongly because of which that parallel plan would
have become costlier compared to non parallel plans. IMO, we can do
this since it also keeps the existing behaviour of the planner i.e.
when the planner is planning for SELECTs it doesn't know that it is
doing it for CTAS. Thoughts?
I have done some initial review and I have a few comments.
@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate,
CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This
+ * is used to calculate the tuple transfer cost from workers to gather
+ * node(in case parallelism kicks in for the SELECT part of the CTAS),
+ * to zero as each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate,
CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
Once we have called IsParallelInsertInCTASAllowed and set the
query->isForCTAS flag then why we are calling this again?
——
---
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;
From this check, it appeared that the lower level gather will also get
influenced by this, consider this
-> NLJ
-> Gather
-> Parallel Seq Scan
-> Index Scan
This condition is only checking that it should be a top-level query
and it should be under CTAS then this will impact all the gather nodes
as shown in the above example.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 7, 2020 at 7:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?
IIUC, you are saying that we should not influence the cost of gather
node even when the insertion would be done by workers? I think that
should be our fallback option anyway but that might miss some paths to
be considered parallel where the cost becomes more due to
parallel_tuple_cost (aka tuple transfer cost). I think the idea is we
can avoid the tuple transfer cost only when Gather is the top node
because only at that time we can push insertion down, right? How about
if we have some way to detect the same before calling
generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to
create UPPER_REL (doesn't have grouping, order, limit, etc clause)
then we can probably assume that the Gather will be top_node. I am not
sure about this but I think it is worth exploring.
--
With Regards,
Amit Kapila.
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather node
even when the insertion would be done by workers? I think that should be
our fallback option anyway but that might miss some paths to be considered
parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
transfer cost). I think the idea is we can avoid the tuple transfer cost
only when Gather is the top node because only at that time we can push
insertion down, right? How about if we have some way to detect the same
before calling generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to create
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
probably assume that the Gather will be top_node. I am not sure about this
but I think it is worth exploring.
I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/
* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);
IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.
if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
Best regards,
houzj
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather node
even when the insertion would be done by workers? I think that should be
our fallback option anyway but that might miss some paths to be considered
parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
transfer cost). I think the idea is we can avoid the tuple transfer cost
only when Gather is the top node because only at that time we can push
insertion down, right? How about if we have some way to detect the same
before calling generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to create
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
probably assume that the Gather will be top_node. I am not sure about this
but I think it is worth exploring.I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
Thanks Amit and Hou. I will look into these areas and get back soon.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 8, 2020 at 6:36 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather node
even when the insertion would be done by workers? I think that should be
our fallback option anyway but that might miss some paths to be considered
parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
transfer cost). I think the idea is we can avoid the tuple transfer cost
only when Gather is the top node because only at that time we can push
insertion down, right? How about if we have some way to detect the same
before calling generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to create
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
probably assume that the Gather will be top_node. I am not sure about this
but I think it is worth exploring.I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)Thanks Amit and Hou. I will look into these areas and get back soon.
It might be better to split the patch for this such that in the base
patch, we won't consider anything special for gather costing w.r.t
CTAS and in the next patch, we consider all the checks discussed
above.
--
With Regards,
Amit Kapila.
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather node
even when the insertion would be done by workers? I think that should be
our fallback option anyway but that might miss some paths to be considered
parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
transfer cost). I think the idea is we can avoid the tuple transfer cost
only when Gather is the top node because only at that time we can push
insertion down, right? How about if we have some way to detect the same
before calling generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to create
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
probably assume that the Gather will be top_node. I am not sure about this
but I think it is worth exploring.I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
Yeah, and as I pointed earlier, along with this we also need to
consider that the RelOptInfo must be the final target(top level rel).
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather node
even when the insertion would be done by workers? I think that should be
our fallback option anyway but that might miss some paths to be considered
parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
transfer cost). I think the idea is we can avoid the tuple transfer cost
only when Gather is the top node because only at that time we can push
insertion down, right? How about if we have some way to detect the same
before calling generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to create
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
probably assume that the Gather will be top_node. I am not sure about this
but I think it is worth exploring.I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)Yeah, and as I pointed earlier, along with this we also need to
consider that the RelOptInfo must be the final target(top level rel).
Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.
Please review it further.
After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v10-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v10-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 2939b2c51bff3548ea15c9054f9e2ab661dddd9b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 10 Dec 2020 05:38:35 +0530
Subject: [PATCH v10] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 332 ++++++++----
src/backend/commands/explain.c | 32 ++
src/backend/executor/execParallel.c | 70 ++-
src/backend/executor/nodeGather.c | 113 ++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 28 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/test/regress/expected/write_parallel.out | 504 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 180 +++++++
13 files changed, 1174 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..20f59cc2b8 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +415,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +430,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
- Assert(into != NULL); /* else somebody forgot to set it */
-
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +654,75 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return false;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ if (queryDesc)
+ {
+ PlanState *ps = queryDesc->planstate;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ bool allow;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node has
+ * no projections to perform and if the upper node is Gather. In case,
+ * the Gather node has projections, which is possible if there are any
+ * subplans in the query, the workers can not do those projections. And
+ * when the upper node is GatherMerge, then the leader has to perform
+ * the final phase i.e. merge the results by workers.
+ */
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe;
+
+ return allow;
+ }
+
+ return true;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the required info for the parallel
+ * inserts, that is required in the plan exection.
+ */
+void SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for
+ * eachworker to build separate dest receiver), object id(for each worker
+ * to open the created table).
+ */
+ ((DR_intorel *) queryDesc->dest)->is_parallel = true;
+ gstate->dest = queryDesc->dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of explain
+ * plans.
+ */
+ queryDesc->plannedstmt->planTree->plan_rows = 0;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..03ac29cd64 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1784,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab3aab58c5 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,9 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
+ QueryDesc *queryDesc);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..8831597d54 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,508 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+ QUERY PLAN
+----------------------------------------------------------------
+ Limit (actual rows=4 loops=1)
+ -> Gather (actual rows=4 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+ QUERY PLAN
+----------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(7 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Partial HashAggregate (actual rows=1 loops=4)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(13 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=1 loops=1)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp3 (actual rows=0 loops=4)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Partial GroupAggregate (actual rows=0 loops=4)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ WindowAgg (actual rows=5 loops=1)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(8 rows)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Nested Loop (actual rows=5 loops=1)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Materialize (actual rows=5 loops=5)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- though the top node is Gather under which there exists parallel unsafe merge
+-- join node, so parallel inserts must not occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Merge Join (actual rows=1 loops=4)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+(16 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..6ee80ebd78 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,184 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- though the top node is Gather under which there exists parallel unsafe merge
+-- join node, so parallel inserts must not occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
--
2.25.1
v10-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v10-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 3bf4f3aee0f6fd9d0f7b29602113e97f35ef6a6f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 9 Dec 2020 18:31:55 +0530
Subject: [PATCH v10] Tuple Cost Adjustment for Parallel Inserts in CTAS
---
src/backend/commands/createas.c | 9 ++++
src/backend/commands/explain.c | 10 +++++
src/backend/optimizer/path/costsize.c | 19 ++++++++-
src/backend/optimizer/plan/planner.c | 61 +++++++++++++++++++++++++++
src/include/commands/createas.h | 16 +++++++
src/include/nodes/parsenodes.h | 1 +
6 files changed, 115 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 20f59cc2b8..eee8d19259 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -316,10 +316,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Indication to the planner that the SELECT is from CTAS so that it
+ * can adjust the parallel tuple cost if possible.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
+
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 03ac29cd64..8bd231a0c3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -387,9 +387,19 @@ ExplainOneQuery(Query *query, int cursorOptions,
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Indication to the planner that the SELECT is from CTAS so that it
+ * can adjust the parallel tuple cost if possible.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
+ if (into)
+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
+
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..3a316f25f1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,22 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of parallel inserts by workers. We
+ * would have set ignore flag in apply_scanjoin_target_to_paths before
+ * generating Gather path for the upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST))
+ {
+ ignore_tuple_cost = true;
+ /* Reset the ignore flag. */
+ root->parse->CTASParallelInsInfo &= ~CTAS_PARALLEL_INS_IGN_TUP_COST;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..1041593237 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,45 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of following cases, a parent path will be generated for the
+ * upper Gather path(in grouping_planner), in which case we can not
+ * let parallel inserts happen. So we do not set ignore tuple cost
+ * flag.
+ */
+ if (root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual)
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_IGN_TUP_COST;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7597,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ab3aab58c5..6e722f0ac0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,22 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_IGN_TUP_COST = 1 << 1
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ec14fc2036..b140a42551 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
--
2.25.1
Hi,
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
...
+ attrList = lappend(attrList, col);
Should attrList be freed when ereport is called ?
+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning
the value of 0 ?
Cheers
On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
wrote:
I'm not quite sure how to address this. Can we not allow the
planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks?IIUC, you are saying that we should not influence the cost of gather
node
even when the insertion would be done by workers? I think that
should be
our fallback option anyway but that might miss some paths to be
considered
parallel where the cost becomes more due to parallel_tuple_cost (aka
tuple
transfer cost). I think the idea is we can avoid the tuple transfer
cost
only when Gather is the top node because only at that time we can
push
insertion down, right? How about if we have some way to detect the
same
before calling generate_useful_gather_paths()? I think when we are
calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance tocreate
UPPER_REL (doesn't have grouping, order, limit, etc clause) then we
can
probably assume that the Gather will be top_node. I am not sure
about this
but I think it is worth exploring.
I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/* Consider generating Gather or Gather Merge paths. We must
only do this
* if the relation is parallel safe, and we don't do it for
child rels to
* avoid creating multiple Gather nodes within the same plan.
We must do
* this after all paths have been generated and before
set_cheapest, since
* one of the generated paths may turn out to be the cheapest
one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);IMO Gatherpath created here seems the right one which can possible
ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to
be the parent of Gatherpath here.
if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets ||root->parse->hasAggs || root->root->hasHavingQual)
Yeah, and as I pointed earlier, along with this we also need to
consider that the RelOptInfo must be the final target(top level rel).Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.Please review it further.
After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 7:48 AM Zhihong Yu <zyu@yugabyte.com> wrote:
+ if (!OidIsValid(col->collOid) && + type_is_collatable(col->typeName->typeOid)) + ereport(ERROR, ... + attrList = lappend(attrList, col);Should attrList be freed when ereport is called ?
I think that's not necessary since we are going to throw an error
anyways. And also that this is not a new code added as part of this
feature, it is an existing code adjusted for parallel inserts. On
looking further in the code base there are many places where we don't
free up the lists before throwing errors.
errmsg("column privileges are only valid for relations")));
errmsg("check constraint \"%s\" already exists",
errmsg("name or argument lists may not contain nulls")));
elog(ERROR, "no tlist entry for key %d", keyresno);
+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning the value of 0 ?
Yeah both are equivalent. For now I will keep it that way, I will
change it in the next version of the patch.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe;
I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
I did some test but did not find a case like that.
Best regards,
houzj
On Thu, Dec 10, 2020 at 3:59 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe;I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 4:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo
&&
+ plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && +
plannedstmt->planTree->lefttree->parallel_aware &&
+
plannedstmt->planTree->lefttree->parallel_safe;
I noticed it check both IsA(ps, GatherState) and
IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but
IsA(plannedstmt->planTree, Gather) is false ?
I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.
Yeah it's an extra check. I don't think we need that extra check
IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified
it as follows: the gatherstate will be allocated and initialized with the
plan tree in ExecInitGather which are the ones we are checking here. So,
there is no chance that the plan state is GatherState and the plan tree
will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check
in the next version of the patch set.
Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>,
estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = *(GatherState *) 0x5647fac83850*
(gdb) p gatherstate->ps.plan
$11 = *(Plan *) 0x5647fac918a0*
Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580,
queryDesc=0x5647fac835e0) at createas.c:663
663 {
(gdb) p ps
$13 = *(PlanState *) 0x5647fac83850*
(gdb) p ps->plan
$14 =* (Plan *) 0x5647fac918a0*
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 5:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 10, 2020 at 4:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe;I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.
Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = (GatherState *) 0x5647fac83850
(gdb) p gatherstate->ps.plan
$11 = (Plan *) 0x5647fac918a0Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
663 {
(gdb) p ps
$13 = (PlanState *) 0x5647fac83850
(gdb) p ps->plan
$14 = (Plan *) 0x5647fac918a0
Hope you did not miss the second part of my comment
"
Apart from that if we combine 0001
and 0002 there should be additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.
"
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe;I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.
Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = (GatherState *) 0x5647fac83850
(gdb) p gatherstate->ps.plan
$11 = (Plan *) 0x5647fac918a0Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
663 {
(gdb) p ps
$13 = (PlanState *) 0x5647fac83850
(gdb) p ps->plan
$14 = (Plan *) 0x5647fac918a0Hope you did not miss the second part of my comment
"Apart from that if we combine 0001
and 0002 there should be additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert."
IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
ignore the parallel tuple cost and while checking to allow or disallow
parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
assert something like Assert(cost_ignored_in_cost_gather && allow)
before return allow;
This assertion fails 1) either if we have not ignored the cost but
allowing parallel inserts 2) or we ignored the cost but not allowing
parallel inserts.
1) seems to be fine, we can go ahead and perform parallel inserts. 2)
is the concern that the planner would have wrongly chosen the parallel
plan, but in this case also isn't it better to go ahead with the
parallel plan instead of failing the query?
+ /*
+ * We allow parallel inserts by the workers only if the Gather node has
+ * no projections to perform and if the upper node is Gather. In case,
+ * the Gather node has projections, which is possible if there are any
+ * subplans in the query, the workers can not do those projections. And
+ * when the upper node is GatherMerge, then the leader has to perform
+ * the final phase i.e. merge the results by workers.
+ */
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe;
+
+ return allow;
+ }
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 7:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe;I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.
Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = (GatherState *) 0x5647fac83850
(gdb) p gatherstate->ps.plan
$11 = (Plan *) 0x5647fac918a0Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
663 {
(gdb) p ps
$13 = (PlanState *) 0x5647fac83850
(gdb) p ps->plan
$14 = (Plan *) 0x5647fac918a0Hope you did not miss the second part of my comment
"Apart from that if we combine 0001
and 0002 there should be additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert."
IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
ignore the parallel tuple cost and while checking to allow or disallow
parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
assert something like Assert(cost_ignored_in_cost_gather && allow)
before return allow;This assertion fails 1) either if we have not ignored the cost but
allowing parallel inserts 2) or we ignored the cost but not allowing
parallel inserts.1) seems to be fine, we can go ahead and perform parallel inserts. 2)
is the concern that the planner would have wrongly chosen the parallel
plan, but in this case also isn't it better to go ahead with the
parallel plan instead of failing the query?+ /* + * We allow parallel inserts by the workers only if the Gather node has + * no projections to perform and if the upper node is Gather. In case, + * the Gather node has projections, which is possible if there are any + * subplans in the query, the workers can not do those projections. And + * when the upper node is GatherMerge, then the leader has to perform + * the final phase i.e. merge the results by workers. + */ + allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe; + + return allow; + }
I added the assertion into the 0002 patch so that it fails when the
planner ignores parallel tuple cost and may choose parallel plan but
later we don't allow parallel inserts. make check and make check-world
passeses without any assertion failures.
Attaching v11 patch set. Please review it further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v11-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v11-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 90ee038cec103a85307711b861b431317f1cd5bf Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 14 Dec 2020 15:16:49 +0530
Subject: [PATCH v11] Tuple Cost Adjustment for Parallel Inserts in CTAS
---
src/backend/commands/createas.c | 42 +++++++++++++++++-
src/backend/commands/explain.c | 14 ++++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 +++++++++-
src/backend/optimizer/plan/planner.c | 61 +++++++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
8 files changed, 158 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9e6c8fb2ba..3ffea41ea6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -316,6 +316,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Indication to the planner that the SELECT is from CTAS so that it
+ * can adjust the parallel tuple cost if possible.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL, NULL))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -344,7 +351,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ if (IsParallelInsertInCTASAllowed(into, queryDesc,
+ &query->CTASParallelInsInfo))
SetCTASParallelInsertState(queryDesc);
/* run the plan to completion */
@@ -659,7 +667,8 @@ intorel_destroy(DestReceiver *self)
* IsParallelInsertInCTASAllowed --- determine whether or not parallel
* insertion is possible.
*/
-bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
if (!IS_CTAS(into))
return false;
@@ -678,6 +687,7 @@ bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
{
PlanState *ps = queryDesc->planstate;
bool allow;
+ bool need_to_assert = false;
/*
* We allow parallel inserts by the workers only if the Gather node has
@@ -690,6 +700,34 @@ bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
*/
allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo;
+ /*
+ * It should not happen that in cost_gather we have ignored the
+ * parallel tuple cost and now we are not allowing the parallel
+ * inserts. And also we might need assertion only if the top node is
+ * GatherState. Because the main intention of assertion is to check if
+ * we enforced planner to ignore the parallel tuple cost (with the
+ * intention of choosing parallel inserts) due to which
+ * the parallel plan was chosen, but we do not allow the parallel
+ * inserts now.
+ */
+ if (!allow && tuple_cost_flags && ps && IsA(ps, GatherState))
+ need_to_assert = true;
+
+ if (need_to_assert)
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner
+ * while creating Gather path, then this assertion failure should
+ * not occur. If it occurs, that means the planner may have chosen
+ * this parallel plan because of our enforcement to ignore the
+ * parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
+
return allow;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 03ac29cd64..d0152deba7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -387,6 +387,13 @@ ExplainOneQuery(Query *query, int cursorOptions,
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Indication to the planner that the SELECT is from CTAS so that it
+ * can adjust the parallel tuple cost if possible.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL, NULL))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -402,7 +409,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +504,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +570,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ if (IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags))
SetCTASParallelInsertState(queryDesc);
/* Execute the plan for statistics if asked for */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 4b18be5b27..12227b6e79 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -674,7 +674,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..d287b6bfbb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,45 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of following cases, a parent path will be generated for the
+ * upper Gather path(in grouping_planner), in which case we can not
+ * let parallel inserts happen. So we do not set ignore tuple cost
+ * flag.
+ */
+ if (root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual)
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7597,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ab3aab58c5..e01a6152ce 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,7 +71,8 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
--
2.25.1
v11-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v11-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 84fd0237cb9b3b2650d9a9d3c139d54dc86e085c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 14 Dec 2020 14:59:26 +0530
Subject: [PATCH v11] Parallel Inserts in CREATE TABLE AS
The idea of this patch is to allow the leader and each worker
insert the tuples in parallel if the SELECT part of the CTAS is
parallelizable.
The design:
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan. After the planning,
check if the upper plan node is Gather in createas.c and mark a
parallelism flag in the CTAS dest receiver. Pass the into clause,
object id, command id from the leader to workers, so that each
worker can create its own CTAS dest receiver. Leader inserts its
share of tuples if instructed to do, and so are workers. Each
worker writes atomically its number of inserted tuples into a
shared memory variable, the leader combines this with its own
number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 +-
src/backend/commands/createas.c | 326 ++++++++----
src/backend/commands/explain.c | 32 ++
src/backend/executor/execParallel.c | 70 ++-
src/backend/executor/nodeGather.c | 113 ++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 28 +
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/test/regress/expected/write_parallel.out | 505 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 180 +++++++
13 files changed, 1169 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..9e6c8fb2ba 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +415,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +430,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
- if (lc)
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +654,69 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return false;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ if (queryDesc)
+ {
+ PlanState *ps = queryDesc->planstate;
+ bool allow;
+
+ /*
+ * We allow parallel inserts by the workers only if the Gather node has
+ * no projections to perform and if the upper node is Gather. In case,
+ * the Gather node has projections, which is possible if there are any
+ * subplans in the query, the workers can not do those projections.
+ * If the upper node is GatherMerge, then the leader has to perform the
+ * final phase i.e. merge the results by workers, so we do not allow
+ * parallel inserts.
+ */
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo;
+
+ return allow;
+ }
+
+ return true;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the required info for the parallel
+ * inserts, that is required in the plan exection.
+ */
+void SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as intoclause(for
+ * eachworker to build separate dest receiver), object id(for each worker
+ * to open the created table).
+ */
+ ((DR_intorel *) queryDesc->dest)->is_parallel = true;
+ gstate->dest = queryDesc->dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of explain
+ * plans.
+ */
+ queryDesc->plannedstmt->planTree->plan_rows = 0;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..03ac29cd64 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertInCTASAllowed(into, queryDesc))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1784,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab3aab58c5 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,9 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertInCTASAllowed(IntoClause *into,
+ QueryDesc *queryDesc);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..84e8f981e1 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,509 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+ QUERY PLAN
+----------------------------------------------------------------
+ Limit (actual rows=4 loops=1)
+ -> Gather (actual rows=4 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+ QUERY PLAN
+----------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(7 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Partial HashAggregate (actual rows=1 loops=4)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(13 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=1 loops=1)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp3 (actual rows=0 loops=4)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Partial GroupAggregate (actual rows=0 loops=4)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ WindowAgg (actual rows=5 loops=1)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(8 rows)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Nested Loop (actual rows=5 loops=1)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Materialize (actual rows=5 loops=5)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Merge Join (actual rows=1 loops=4)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..bff3fcc6b5 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,184 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
--
2.25.1
Hi
Attaching v11 patch set. Please review it further.
Currently with the patch, we can allow parallel CTAS when topnode is Gather.
When top-node is Append and Gather is the sub-node of Append, I think we can still enable
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->Seqscan
And the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;
I attatch a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.
What do you think?
Best regards,
houzj
Attachments:
0001-patch-for-pctas-in-append.patchapplication/octet-stream; name=0001-patch-for-pctas-in-append.patchDownload
From 23a6b88d2d42913d559ca61b0050588cdd0db0d4 Mon Sep 17 00:00:00 2001
From: root <root@localhost.localdomain>
Date: Mon, 14 Dec 2020 06:31:12 -0500
Subject: [PATCH] patch for pinsert in append
---
src/backend/commands/createas.c | 35 +++++++++++++++++++++++++++++++----
src/backend/commands/explain.c | 3 +--
src/backend/optimizer/path/allpaths.c | 34 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 2 +-
src/include/commands/createas.h | 3 ++-
5 files changed, 69 insertions(+), 8 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3ffea41..6e6c467 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -351,9 +351,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc,
- &query->CTASParallelInsInfo))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -663,6 +661,35 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if(ps == NULL)
+ return parallel;
+
+ if(IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+ for(int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+ }
+ }
+ else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}
+
/*
* IsParallelInsertInCTASAllowed --- determine whether or not parallel
* insertion is possible.
@@ -698,7 +725,7 @@ bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc,
* final phase i.e. merge the results by workers, so we do not allow
* parallel inserts.
*/
- allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo;
+ allow = PushDownCTASParallelInsertState(queryDesc->dest, ps);
/*
* It should not happen that in cost_gather we have ignored the
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d0152de..136f6f4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -570,8 +570,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b0..fe3332e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,11 +1104,44 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ if(childrel->rtekind == RTE_SUBQUERY)
+ {
+ if(root->query_level != 1)
+ {
+ if (root->parent_root &&
+ (root->parent_root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual))
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ else
+ {
+ if (!(root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual))
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
/*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if(root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ root->parse->CTASParallelInsInfo &= ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d287b6b..da5ce1f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,7 +7350,7 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
+ if ((root->query_level == 1 || (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)) &&
(root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
{
/*
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index e01a615..7a50b18 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,8 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
--
1.8.3.1
On Mon, Dec 14, 2020 at 4:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 10, 2020 at 7:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe;I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?I did some test but did not find a case like that.
This seems like an extra check. Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.
Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = (GatherState *) 0x5647fac83850
(gdb) p gatherstate->ps.plan
$11 = (Plan *) 0x5647fac918a0Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
663 {
(gdb) p ps
$13 = (PlanState *) 0x5647fac83850
(gdb) p ps->plan
$14 = (Plan *) 0x5647fac918a0Hope you did not miss the second part of my comment
"Apart from that if we combine 0001
and 0002 there should be additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert."
IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
ignore the parallel tuple cost and while checking to allow or disallow
parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
assert something like Assert(cost_ignored_in_cost_gather && allow)
before return allow;This assertion fails 1) either if we have not ignored the cost but
allowing parallel inserts 2) or we ignored the cost but not allowing
parallel inserts.1) seems to be fine, we can go ahead and perform parallel inserts. 2)
is the concern that the planner would have wrongly chosen the parallel
plan, but in this case also isn't it better to go ahead with the
parallel plan instead of failing the query?+ /* + * We allow parallel inserts by the workers only if the Gather node has + * no projections to perform and if the upper node is Gather. In case, + * the Gather node has projections, which is possible if there are any + * subplans in the query, the workers can not do those projections. And + * when the upper node is GatherMerge, then the leader has to perform + * the final phase i.e. merge the results by workers. + */ + allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe; + + return allow; + }I added the assertion into the 0002 patch so that it fails when the
planner ignores parallel tuple cost and may choose parallel plan but
later we don't allow parallel inserts. make check and make check-world
passeses without any assertion failures.Attaching v11 patch set. Please review it further.
I can see a lot of unrelated changes in 0002, or you have done a lot
of code refactoring especially in createas.c file. If it is intended
refactoring then please move the refactoring to a separate patch so
that the patch is readable.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
For set_append_rel_size(), it seems this is the difference
between query_level != 1 and query_level == 1:
+ (root->parent_root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
Maybe extract the common conditions into its own expression / variable so
that the code is easier to read.
Cheers
On Mon, Dec 14, 2020 at 4:50 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
wrote:
Show quoted text
Hi
Attaching v11 patch set. Please review it further.
Currently with the patch, we can allow parallel CTAS when topnode is
Gather.
When top-node is Append and Gather is the sub-node of Append, I think we
can still enable
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such
as:Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->SeqscanAnd the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;I attatch a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.What do you think?
Best regards,
houzj
On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Currently with the patch, we can allow parallel CTAS when topnode is Gather.
When top-node is Append and Gather is the sub-node of Append, I think we can still enable
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->SeqscanAnd the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;
Thanks for the append use case.
Here's my analysis on pushing parallel inserts down even in case the
top node is Append.
For union cases which need to remove duplicate tuples, we can't push
the inserts or CTAS dest receiver down. If I'm not wrong, Append node
is not doing duplicate removal(??), I saw that it's the HashAggregate
node (which is the top node that removes the duplicate tuples). And
also for except/except all/intersect/intersect all cases we receive
HashSetOp nodes on top of Append. So for both cases, our check for
Gather or Append at the top node is enough to detect this to not allow
parallel inserts.
For union all:
case 1: We can push the CTAS dest receiver to each Gather node
Append
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
case 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples
to Append and from there to CTAS dest receiver.
Append
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
case 3: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
case 4: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
Please let me know if I'm missing any other possible use case.
Thoughts?
I attach a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.What do you think?
As suggested by Amit earlier, I kept the 0001 patch(so far) such that
it doesn't have the code to influence the planner to consider parallel
tuple cost as 0. It works on the plan whatever gets generated and
decides to allow parallel inserts or not. And in the 0002 patch, I
added the code for influencing the planner to consider parallel tuple
cost as 0. Maybe we can have a 0003 patch for tests alone.
Once we are okay with the above analysis and use cases, we can
incorporate the Append changes to respective patches.
Hope that's okay.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Currently with the patch, we can allow parallel CTAS when topnode is Gather.
When top-node is Append and Gather is the sub-node of Append, I think we can still enable
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->SeqscanAnd the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;Thanks for the append use case.
Here's my analysis on pushing parallel inserts down even in case the
top node is Append.For union cases which need to remove duplicate tuples, we can't push
the inserts or CTAS dest receiver down. If I'm not wrong, Append node
is not doing duplicate removal(??), I saw that it's the HashAggregate
node (which is the top node that removes the duplicate tuples). And
also for except/except all/intersect/intersect all cases we receive
HashSetOp nodes on top of Append. So for both cases, our check for
Gather or Append at the top node is enough to detect this to not allow
parallel inserts.For union all:
case 1: We can push the CTAS dest receiver to each Gather node
Append
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scancase 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples
to Append and from there to CTAS dest receiver.
Append
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodecase 3: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scancase 4: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodePlease let me know if I'm missing any other possible use case.
Thoughts?
Your analysis looks right to me.
I attach a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.What do you think?
As suggested by Amit earlier, I kept the 0001 patch(so far) such that
it doesn't have the code to influence the planner to consider parallel
tuple cost as 0. It works on the plan whatever gets generated and
decides to allow parallel inserts or not. And in the 0002 patch, I
added the code for influencing the planner to consider parallel tuple
cost as 0. Maybe we can have a 0003 patch for tests alone.
Yeah, that makes sense and it will be easy for the review.
Once we are okay with the above analysis and use cases, we can
incorporate the Append changes to respective patches.Hope that's okay.
Make sense to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Thanks for the append use case.
Here's my analysis on pushing parallel inserts down even in case the top
node is Append.For union cases which need to remove duplicate tuples, we can't push the
inserts or CTAS dest receiver down. If I'm not wrong, Append node is not
doing duplicate removal(??), I saw that it's the HashAggregate node (which
is the top node that removes the duplicate tuples). And also for
except/except all/intersect/intersect all cases we receive HashSetOp nodes
on top of Append. So for both cases, our check for Gather or Append at the
top node is enough to detect this to not allow parallel inserts.For union all:
case 1: We can push the CTAS dest receiver to each Gather node Append
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scancase 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples to
Append and from there to CTAS dest receiver.
Append
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodecase 3: We can push the CTAS dest receiver to Gather Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scancase 4: We can push the CTAS dest receiver to Gather Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodePlease let me know if I'm missing any other possible use case.
Thoughts?
Yes, The analysis looks right to me.
As suggested by Amit earlier, I kept the 0001 patch(so far) such that it
doesn't have the code to influence the planner to consider parallel tuple
cost as 0. It works on the plan whatever gets generated and decides to allow
parallel inserts or not. And in the 0002 patch, I added the code for
influencing the planner to consider parallel tuple cost as 0. Maybe we can
have a 0003 patch for tests alone.Once we are okay with the above analysis and use cases, we can incorporate
the Append changes to respective patches.Hope that's okay.
A little explanation about how to push down the ctas info in append.
1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
----set_append_rel_size (*)
-----set_rel_size
------set_subquery_pathlist
-------subquery_planner
--------grouping_planner
---------apply_scanjoin_target_to_paths
----------generate_useful_gather_paths
set_append_rel_size seems the right place where we can check and set a flag to ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.
The case ii) is to check append under append:
Append
->Append
->Gather
->Other plan
2.about how to push ctas info down.
We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will reture true if one of the subplan can be parallel.
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if(ps == NULL)
+ return parallel;
+
+ if(IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+ for(int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+ }
+ }
+ else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}
Best regards,
houzj
Attachments:
0001-support-pctas-in-append-parallel-inserts.patchapplication/octet-stream; name=0001-support-pctas-in-append-parallel-inserts.patchDownload
From 0b411712b8b931d32e3d836ccda173e6077692c5 Mon Sep 17 00:00:00 2001
From: root <root@localhost.localdomain>
Date: Tue, 15 Dec 2020 05:58:43 -0500
Subject: [PATCH 1/2] support pctas in append parallel inserts
---
src/backend/commands/createas.c | 35 +++++++++++++++++++++++++++++++----
src/backend/commands/explain.c | 3 +--
2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3ffea41..6e6c467 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -351,9 +351,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc,
- &query->CTASParallelInsInfo))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -663,6 +661,35 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if(ps == NULL)
+ return parallel;
+
+ if(IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+ for(int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+ }
+ }
+ else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}
+
/*
* IsParallelInsertInCTASAllowed --- determine whether or not parallel
* insertion is possible.
@@ -698,7 +725,7 @@ bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc,
* final phase i.e. merge the results by workers, so we do not allow
* parallel inserts.
*/
- allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo;
+ allow = PushDownCTASParallelInsertState(queryDesc->dest, ps);
/*
* It should not happen that in cost_gather we have ignored the
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d0152de..136f6f4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -570,8 +570,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
--
1.8.3.1
0002-support-pctas-in-append-tuple-cost-adjustment.patchapplication/octet-stream; name=0002-support-pctas-in-append-tuple-cost-adjustment.patchDownload
From 6daa1cf2c4f724603f64c0c89e915769c9334857 Mon Sep 17 00:00:00 2001
From: root <root@localhost.localdomain>
Date: Tue, 15 Dec 2020 06:01:10 -0500
Subject: [PATCH 2/2] support pctas in append tuple cost adjustment
---
src/backend/optimizer/path/allpaths.c | 29 +++++++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 5 +++--
src/include/commands/createas.h | 3 ++-
3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b0..82f85b8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1104,10 +1105,38 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
set_rel_consider_parallel(root, childrel, childRTE);
/*
+ * When subplan is subquery, It's possible to do parallel insert
+ * if top-node of subquery is Gather, so we set the flag to
+ * ignore parallel tuple cost by the Gather path in cost_gather
+ * if the SELECT is for CTAS.
+ */
+ if(childrel->rtekind == RTE_SUBQUERY)
+ {
+ if((root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ &&
+ !(root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual))
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
+ /*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if(root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ root->parse->CTASParallelInsInfo &= ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d287b6b..7842d71 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,9 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if ((root->query_level == 1 ||
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)) &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
{
/*
* In each of following cases, a parent path will be generated for the
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index e01a615..7a50b18 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,8 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
--
1.8.3.1
From: Hou, Zhijie [mailto:houzj.fnst@cn.fujitsu.com]
Sent: Tuesday, December 15, 2020 7:30 PM
To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>; Luc Vlaming <luc@swarm64.com>;
PostgreSQL-development <pgsql-hackers@postgresql.org>; Zhihong Yu
<zyu@yugabyte.com>; Dilip Kumar <dilipbalaut@gmail.com>
Subject: RE: Parallel Inserts in CREATE TABLE ASThanks for the append use case.
Here's my analysis on pushing parallel inserts down even in case the
top node is Append.For union cases which need to remove duplicate tuples, we can't push
the inserts or CTAS dest receiver down. If I'm not wrong, Append node
is not doing duplicate removal(??), I saw that it's the HashAggregate
node (which is the top node that removes the duplicate tuples). And
also for except/except all/intersect/intersect all cases we receive
HashSetOp nodes on top of Append. So for both cases, our check for
Gather or Append at the top node is enough to detect this to not allowparallel inserts.
For union all:
case 1: We can push the CTAS dest receiver to each Gather node Append
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scancase 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples
to Append and from there to CTAS dest receiver.
Append
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodecase 3: We can push the CTAS dest receiver to Gather Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scancase 4: We can push the CTAS dest receiver to Gather Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather nodePlease let me know if I'm missing any other possible use case.
Thoughts?
Yes, The analysis looks right to me.
As suggested by Amit earlier, I kept the 0001 patch(so far) such that
it doesn't have the code to influence the planner to consider parallel
tuple cost as 0. It works on the plan whatever gets generated and
decides to allow parallel inserts or not. And in the 0002 patch, I
added the code for influencing the planner to consider parallel tuple
cost as 0. Maybe we can have a 0003 patch for tests alone.Once we are okay with the above analysis and use cases, we can
incorporate the Append changes to respective patches.Hope that's okay.
A little explanation about how to push down the ctas info in append.
1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
----set_append_rel_size (*)
-----set_rel_size
------set_subquery_pathlist
-------subquery_planner
--------grouping_planner
---------apply_scanjoin_target_to_paths
----------generate_useful_gather_pathsset_append_rel_size seems the right place where we can check and set a flag
to ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be
created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.The case ii) is to check append under append:
Append
->Append
->Gather
->Other plan2.about how to push ctas info down.
We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode
does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will
reture true if one of the subplan can be parallel.+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { + bool parallel = false; + + if(ps == NULL) + return parallel; + + if(IsA(ps, AppendState)) + { + AppendState *aps = (AppendState *) ps; + for(int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]); + } + } + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + parallel = true; + + ((DR_intorel *) dest)->is_parallel = true; + gstate->dest = dest; + ps->plan->plan_rows = 0; + } + + return parallel; +}
So sorry for my miss, my last patch has some mistakes.
Attatch the new one.
Best regards,
houzj
Attachments:
0001-support-pctas-in-append-parallel-inserts.patchapplication/octet-stream; name=0001-support-pctas-in-append-parallel-inserts.patchDownload
From e5cb1c34706ef21c8942087511726d9c06e1fd65 Mon Sep 17 00:00:00 2001
From: root <root@localhost.localdomain>
Date: Tue, 15 Dec 2020 07:14:21 -0500
Subject: [PATCH 1/2] support pctas in append parallel inserts
---
src/backend/commands/createas.c | 35 +++++++++++++++++++++++++++++++----
src/backend/commands/explain.c | 3 +--
2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3ffea41..6e6c467 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -351,9 +351,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc,
- &query->CTASParallelInsInfo))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -663,6 +661,35 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if(ps == NULL)
+ return parallel;
+
+ if(IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+ for(int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+ }
+ }
+ else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}
+
/*
* IsParallelInsertInCTASAllowed --- determine whether or not parallel
* insertion is possible.
@@ -698,7 +725,7 @@ bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc,
* final phase i.e. merge the results by workers, so we do not allow
* parallel inserts.
*/
- allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo;
+ allow = PushDownCTASParallelInsertState(queryDesc->dest, ps);
/*
* It should not happen that in cost_gather we have ignored the
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d0152de..136f6f4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -570,8 +570,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- if (IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags))
- SetCTASParallelInsertState(queryDesc);
+ IsParallelInsertInCTASAllowed(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
--
1.8.3.1
0002-support-pctas-in-append-tuple-cost-adjustment.patchapplication/octet-stream; name=0002-support-pctas-in-append-tuple-cost-adjustment.patchDownload
From 30d1e78ce73e07c463b5c946b571ee964caeb3eb Mon Sep 17 00:00:00 2001
From: root <root@localhost.localdomain>
Date: Tue, 15 Dec 2020 07:14:58 -0500
Subject: [PATCH 2/2] support pctas in append tuple cost adjustment
---
src/backend/optimizer/path/allpaths.c | 29 +++++++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 9 +++++++--
src/include/commands/createas.h | 3 ++-
3 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b0..82f85b8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1104,10 +1105,38 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
set_rel_consider_parallel(root, childrel, childRTE);
/*
+ * When subplan is subquery, It's possible to do parallel insert
+ * if top-node of subquery is Gather, so we set the flag to
+ * ignore parallel tuple cost by the Gather path in cost_gather
+ * if the SELECT is for CTAS.
+ */
+ if(childrel->rtekind == RTE_SUBQUERY)
+ {
+ if((root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ &&
+ !(root->parse->rowMarks ||
+ limit_needed(root->parse) ||
+ root->parse->sortClause ||
+ root->parse->distinctClause ||
+ root->parse->hasWindowFuncs ||
+ root->parse->groupClause ||
+ root->parse->groupingSets ||
+ root->parse->hasAggs ||
+ root->hasHavingQual))
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
+ /*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if(root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ root->parse->CTASParallelInsInfo &= ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d287b6b..86d42e1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,13 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if(root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of following cases, a parent path will be generated for the
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index e01a615..7a50b18 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,8 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
--
1.8.3.1
On Tue, Dec 15, 2020 at 5:48 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
A little explanation about how to push down the ctas info in append.
1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
----set_append_rel_size (*)
-----set_rel_size
------set_subquery_pathlist
-------subquery_planner
--------grouping_planner
---------apply_scanjoin_target_to_paths
----------generate_useful_gather_pathsset_append_rel_size seems the right place where we can check and set a flag
to ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be
created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.The case ii) is to check append under append:
Append
->Append
->Gather
->Other plan2.about how to push ctas info down.
We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode
does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will
reture true if one of the subplan can be parallel.+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { + bool parallel = false; + + if(ps == NULL) + return parallel; + + if(IsA(ps, AppendState)) + { + AppendState *aps = (AppendState *) ps; + for(int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]); + } + } + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + parallel = true; + + ((DR_intorel *) dest)->is_parallel = true; + gstate->dest = dest; + ps->plan->plan_rows = 0; + } + + return parallel; +}So sorry for my miss, my last patch has some mistakes.
Attatch the new one.
Thanks for the append patches. Basically your changes look good to me.
I'm merging them to the original patch set and adding the test cases
to cover these cases. I will post the updated patch set soon.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 15, 2020 at 5:53 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
I'm merging them to the original patch set and adding the test cases
to cover these cases. I will post the updated patch set soon.
Attaching v12 patch set.
0001 - parallel inserts without tuple cost enforcement.
0002 - enforce planner for parallel tuple cost
0003 - test cases
Please review it further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v12-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v12-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From c55b3e3ca342253a4d6b881a16c410bbb1a41678 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 16 Dec 2020 11:41:37 +0530
Subject: [PATCH v12 1/3] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather or Append in createas.c and mark a parallelism
flag in the CTAS dest receiver and push it down to Gather node. Each
worker can create its own CTAS dest receiver with the information
passed from the leader. Leader inserts its share of tuples if
instructed to do, and so are workers. Each worker writes atomically
its number of inserted tuples into a shared memory variable, the
leader combines this with its own number of inserted tuples and
shares to the client.
Authors: Bharath Rupireddy, Hou, Zhijie
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 ++-
src/backend/commands/createas.c | 345 +++++++++++++++++--------
src/backend/commands/explain.c | 31 +++
src/backend/executor/execParallel.c | 70 ++++-
src/backend/executor/nodeGather.c | 113 +++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 25 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 499 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..be381f9748 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +429,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
-
- Assert(into != NULL); /* else somebody forgot to set it */
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
+
+ Assert(into != NULL); /* else somebody forgot to set it */
- if (lc)
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +653,89 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i]);
+ }
+ }
+ else if (IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as into clause (for
+ * each worker to build separate dest receiver), object id (for each
+ * worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of
+ * explain plans.
+ */
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}
+
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return;
+
+ if (queryDesc)
+ (void) PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate);
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fbd0bc5a81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ed4690305b 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern void ChooseParallelInsertsInCTAS(IntoClause *into,
+ QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v12-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v12-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 20aa2cede64793a9f9b2b7f9b6c7574279df9c73 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 16 Dec 2020 11:45:58 +0530
Subject: [PATCH v12 2/3] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
Authors: Bharath Rupireddy, Hou, Zhijie
---
src/backend/commands/createas.c | 88 ++++++++++++++++++++-------
src/backend/commands/explain.c | 7 ++-
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/allpaths.c | 39 ++++++++++++
src/backend/optimizer/path/costsize.c | 22 ++++++-
src/backend/optimizer/plan/planner.c | 59 ++++++++++++++++++
src/include/commands/createas.h | 23 ++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++
10 files changed, 225 insertions(+), 30 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index be381f9748..8401e80185 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -344,7 +344,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc,
+ &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -673,7 +674,8 @@ intorel_destroy(DestReceiver *self)
* parallel insertions by the workers. Otherwise returns false.
*/
static bool
-PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
{
bool parallel = false;
@@ -687,29 +689,39 @@ PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
for (int i = 0; i < aps->as_nplans; i++)
{
parallel |= PushDownCTASParallelInsertState(dest,
- aps->appendplans[i]);
+ aps->appendplans[i],
+ gather_exists);
}
}
- else if (IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ else if (IsA(ps, GatherState))
{
- GatherState *gstate = (GatherState *) ps;
- parallel = true;
-
/*
- * For parallelizing inserts in CTAS i.e. making each parallel worker
- * insert the tuples, we must send information such as into clause (for
- * each worker to build separate dest receiver), object id (for each
- * worker to open the created table).
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
*/
- ((DR_intorel *) dest)->is_parallel = true;
- gstate->dest = dest;
+ *gather_exists |= true;
- /*
- * Since there are no rows that are transferred from workers to Gather
- * node, so we set it to 0 to be visible in estimated row count of
- * explain plans.
- */
- ps->plan->plan_rows = 0;
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel
+ * worker insert the tuples, we must send information such as into
+ * clause (for each worker to build separate dest receiver), object
+ * id (for each worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in estimated row
+ * count of explain plans.
+ */
+ ps->plan->plan_rows = 0;
+ }
}
return parallel;
@@ -720,8 +732,12 @@ PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
* insertion is possible, if yes set the parallel insert state i.e. push down
* the dest receiver to the Gather nodes.
*/
-void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
+ bool gather_exists = false;
+ bool allow = false;
+
if (!IS_CTAS(into))
return;
@@ -735,7 +751,33 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
return;
- if (queryDesc)
- (void) PushDownCTASParallelInsertState(queryDesc->dest,
- queryDesc->planstate);
+ if (!queryDesc)
+ return;
+
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
+
+ /*
+ * It should not happen that in cost_gather we have ignored the parallel
+ * tuple cost and now we are not allowing the parallel inserts. And also we
+ * might need assertion only if the top node is Gather or Append under it
+ * we have Gather. The main intention of assertion is to check if we
+ * enforced planner to ignore the parallel tuple cost (with the intention
+ * of choosing parallel inserts) due to which the parallel plan may have
+ * been chosen, but we do not allow the parallel inserts now.
+ */
+ if (!allow && tuple_cost_flags && gather_exists)
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur.
+ * If it occurs, that means the planner may have chosen this parallel
+ * plan because of our enforcement to ignore the parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbd0bc5a81..efdb34d1f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -402,7 +402,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +497,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +563,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..07c9d0f3d7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,11 +1104,49 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we set the flag to ignore parallel
+ * tuple cost by the Gather path in cost_gather if the SELECT is for
+ * CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * We set the flag for two cases when there is no parent path will
+ * be created(such as : limit,sort,distinct...):
+ * i) query_level is 1
+ * ii) query_level > 1 then set the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..7555cde61a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,43 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not set ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7595,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ed4690305b..4103ac65f0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,26 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Set to this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,5 +73,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern void ChooseParallelInsertsInCTAS(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v12-0003-Tests-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v12-0003-Tests-For-Parallel-Inserts-in-CTAS.patchDownload
From 07aa36fb45d34ac170df2048c1ad224813b7f659 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 16 Dec 2020 09:27:59 +0530
Subject: [PATCH v12 3/3] Tests For Parallel Inserts in CTAS
---
src/test/regress/expected/write_parallel.out | 1226 ++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 402 ++++++
2 files changed, 1628 insertions(+)
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..f8f1890228 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,1230 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+ QUERY PLAN
+----------------------------------------------------------------
+ Limit (actual rows=4 loops=1)
+ -> Gather (actual rows=4 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+ QUERY PLAN
+----------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(7 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Partial HashAggregate (actual rows=1 loops=4)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(13 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=1 loops=1)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp3 (actual rows=0 loops=4)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Partial GroupAggregate (actual rows=0 loops=4)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ WindowAgg (actual rows=5 loops=1)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(8 rows)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Nested Loop (actual rows=5 loops=1)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Materialize (actual rows=5 loops=5)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Merge Join (actual rows=1 loops=4)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Append (actual rows=4 loops=4)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=5 loops=1)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: 1
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ -> Parallel Append (actual rows=8 loops=2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;
+ QUERY PLAN
+-------------------------------------------------------------------------------
+ Append (actual rows=5 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp1 (actual rows=5 loops=1)
+ SubPlan 1
+ -> Limit (actual rows=1 loops=5)
+ -> Gather (actual rows=1 loops=5)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=20)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 where col1 = (select 1) union all select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp2 (actual rows=1 loops=1)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: 4
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Append (actual rows=15 loops=1)
+ -> Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+ SubPlan 1
+ -> Limit (actual rows=1 loops=5)
+ -> Gather (actual rows=1 loops=5)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=20)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Append (actual rows=4 loops=4)
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: 1
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=1 loops=1)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: 4
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: 1
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp2 temp2_2 (actual rows=1 loops=1)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 4
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=10 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=3 loops=1)
+ -> Append (actual rows=7 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=2 loops=1)
+ -> Gather (actual rows=2 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: 1
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=3 loops=1)
+ -> Append (actual rows=7 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=2 loops=1)
+ -> Gather (actual rows=2 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: 1
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=5 loops=1)
+ -> Append (actual rows=10 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=5 loops=1)
+ -> Append (actual rows=10 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..67de31fb91 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,406 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 where col1 = (select 1) union all select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
--
2.25.1
On Wed, Dec 16, 2020 at 12:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Dec 15, 2020 at 5:53 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I'm merging them to the original patch set and adding the test cases
to cover these cases. I will post the updated patch set soon.Attaching v12 patch set.
0001 - parallel inserts without tuple cost enforcement.
0002 - enforce planner for parallel tuple cost
0003 - test casesPlease review it further.
I think it will be clean to implement the parallel CTAS when a
top-level node is the gather node. Basically, the idea is that
whenever we get the gather on the top which doesn't have any
projection then we can push down the dest receiver directly to the
worker. I agree that append is an exception that doesn't do any extra
processing other than appending the results, So IMHO it would be
better that in the first part we parallelize the plan where gather
node on top. I see that we have already worked on the patch where the
append node is on top so I would suggest that we can keep that part in
a separate patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi
The cfbost seems complains about the testcase:
Command exited with code 1
perl dumpregr.pl
=== $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out C:/projects/postgresql/src/test/regress/results/write_parallel.out
--- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 +0000
+++ C:/projects/postgresql/src/test/regress/results/write_parallel.out 2020-12-21 01:47:20.375514800 +0000
@@ -1204,7 +1204,7 @@
-> Gather (actual rows=2 loops=1)
Workers Planned: 3
Workers Launched: 3
- -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
Filter: (col2 < 3)
Rows Removed by Filter: 1
(14 rows)
@@ -1233,7 +1233,7 @@
-> Gather (actual rows=2 loops=1)
Workers Planned: 3
Workers Launched: 3
- -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
Filter: (col2 < 3)
Rows Removed by Filter: 1
(14 rows)
Best regards,
houzj
On Fri, Dec 18, 2020 at 10:08 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think it will be clean to implement the parallel CTAS when a
top-level node is the gather node. Basically, the idea is that
whenever we get the gather on the top which doesn't have any
projection then we can push down the dest receiver directly to the
worker. I agree that append is an exception that doesn't do any extra
processing other than appending the results, So IMHO it would be
better that in the first part we parallelize the plan where gather
node on top. I see that we have already worked on the patch where the
append node is on top so I would suggest that we can keep that part in
a separate patch.
Thanks! I rearranged the patches to keep the append part separate in
the 0004 patch.
Attaching v13 patch set:
0001 - parallel inserts in ctas without planner enforcement for tuple
cost calculation
0002 - planner enforcement for tuple cost calculation
0003 - tests
0004 - enabling parallel inserts for Append cases, related planner
enforcement code and tests.
Please consider these patches for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v13-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v13-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From e2c7dea3f1f0171b1d42b99f493860ccf1f6ccaa Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 14:35:24 +0530
Subject: [PATCH v13 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 ++-
src/backend/commands/createas.c | 303 ++++++++++++++++---------
src/backend/commands/explain.c | 31 +++
src/backend/executor/execParallel.c | 70 +++++-
src/backend/executor/nodeGather.c | 113 ++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 25 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 457 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..10f4f2b4d7 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +429,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
- Assert(into != NULL); /* else somebody forgot to set it */
-
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +653,47 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return;
+
+ if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
+ !queryDesc->planstate->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+ DestReceiver *dest = queryDesc->dest;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as into clause (for
+ * each worker to build separate dest receiver), object id (for each
+ * worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of
+ * explain plans.
+ */
+ queryDesc->planstate->plan->plan_rows = 0;
+ }
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fbd0bc5a81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ed4690305b 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern void ChooseParallelInsertsInCTAS(IntoClause *into,
+ QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v13-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v13-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 2484cfb3e57c79788448b04bfc1760b59fd20c4d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 15:19:23 +0530
Subject: [PATCH v13 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 35 +++++++++++++++++-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 ++++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 146 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 10f4f2b4d7..210927d4f4 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -344,7 +344,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc,
+ &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -659,8 +660,11 @@ intorel_destroy(DestReceiver *self)
* insertion is possible, if yes set the parallel insert state i.e. push down
* the dest receiver to the Gather nodes.
*/
-void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
+ bool allow = false;
+
if (!IS_CTAS(into))
return;
@@ -695,5 +699,32 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
* explain plans.
*/
queryDesc->planstate->plan->plan_rows = 0;
+
+ allow = true;
}
+
+ /*
+ * It should not happen that in cost_gather we have ignored the parallel
+ * tuple cost and now we are not allowing the parallel inserts. And also we
+ * might need assertion only if the top node is Gather. The main intention
+ * of assertion is to check if we enforced planner to ignore the parallel
+ * tuple cost (with the intention of choosing parallel inserts) due to
+ * which the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ */
+ if (!allow && tuple_cost_flags && queryDesc &&
+ IsA(queryDesc->planstate, GatherState))
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur.
+ * If it occurs, that means the planner may have chosen this parallel
+ * plan because of our enforcement to ignore the parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
+
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbd0bc5a81..efdb34d1f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -402,7 +402,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +497,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +563,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..f1134711b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not set ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ed4690305b..0ae2f49e0c 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Set to this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,5 +71,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern void ChooseParallelInsertsInCTAS(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v13-0003-Tests-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v13-0003-Tests-For-Parallel-Inserts-in-CTAS.patchDownload
From f7035babca2529cbb227ac871a678861c9c7197a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 14:40:49 +0530
Subject: [PATCH v13 3/4] Tests For Parallel Inserts in CTAS
---
src/test/regress/expected/write_parallel.out | 505 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 181 +++++++
2 files changed, 686 insertions(+)
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..84e8f981e1 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,509 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=10000 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+ QUERY PLAN
+-----------------------------------------------------
+ LockRows (actual rows=10000 loops=1)
+ -> Seq Scan on tenk1 (actual rows=10000 loops=1)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+ QUERY PLAN
+-------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=2000 loops=5)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on tenk1 (actual rows=10000 loops=1)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk2 (actual rows=1 loops=5)
+ -> Parallel Seq Scan on tenk1 (actual rows=625 loops=4)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1875
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather (actual rows=2500 loops=1)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=500 loops=5)
+ Filter: (four = 3)
+ Rows Removed by Filter: 1500
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=1 loops=2500)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+ QUERY PLAN
+----------------------------------------------------------------
+ Limit (actual rows=4 loops=1)
+ -> Gather (actual rows=4 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(5 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+ QUERY PLAN
+----------------------------------------------------------------
+ Gather Merge (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(10 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+ QUERY PLAN
+----------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(7 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=5 loops=1)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Partial HashAggregate (actual rows=1 loops=4)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(13 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=1 loops=1)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp3 (actual rows=0 loops=4)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Partial GroupAggregate (actual rows=0 loops=4)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+ QUERY PLAN
+----------------------------------------------------------------------
+ WindowAgg (actual rows=5 loops=1)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+(8 rows)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Nested Loop (actual rows=5 loops=1)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Materialize (actual rows=5 loops=5)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Merge Join (actual rows=1 loops=4)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=1 loops=4)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Sort (actual rows=5 loops=1)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=1 loops=4)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Hash (actual rows=1 loops=4)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..7245cb97c6 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,185 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+create temporary table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create unlogged table parallel_write as select length(stringu1) from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into temporary parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+select length(stringu1) into unlogged parallel_write from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select length(stringu1) from tenk1 for update;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as execute parallel_write_prep;
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select now(), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select nextval('parallel_write_sequence'), four from tenk1;
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 limit 4;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 order by 1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select distinct * from temp1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select count(*) from temp1 group by col1;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
rollback;
--
2.25.1
v13-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v13-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From de3edce67c45e2104d0789bff3a18bb3ab9562da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 15:32:33 +0530
Subject: [PATCH v13 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 104 ++-
src/backend/optimizer/path/allpaths.c | 39 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 721 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 221 ++++++
6 files changed, 1071 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 210927d4f4..20d4f805d0 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -655,6 +655,78 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel
+ * worker insert the tuples, we must send information such as into
+ * clause (for each worker to build separate dest receiver), object
+ * id (for each worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in estimated row
+ * count of explain plans.
+ */
+ ps->plan->plan_rows = 0;
+ }
+ }
+
+ return parallel;
+}
+
/*
* ChooseParallelInsertsInCTAS --- determine whether or not parallel
* insertion is possible, if yes set the parallel insert state i.e. push down
@@ -664,6 +736,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
uint8 *tuple_cost_flags)
{
bool allow = false;
+ bool gather_exists = false;
if (!IS_CTAS(into))
return;
@@ -678,30 +751,12 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
return;
- if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
- !queryDesc->planstate->ps_ProjInfo)
- {
- GatherState *gstate = (GatherState *) queryDesc->planstate;
- DestReceiver *dest = queryDesc->dest;
-
- /*
- * For parallelizing inserts in CTAS i.e. making each parallel worker
- * insert the tuples, we must send information such as into clause (for
- * each worker to build separate dest receiver), object id (for each
- * worker to open the created table).
- */
- ((DR_intorel *) dest)->is_parallel = true;
- gstate->dest = dest;
-
- /*
- * Since there are no rows that are transferred from workers to Gather
- * node, so we set it to 0 to be visible in estimated row count of
- * explain plans.
- */
- queryDesc->planstate->plan->plan_rows = 0;
+ if (!queryDesc)
+ return;
- allow = true;
- }
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
* It should not happen that in cost_gather we have ignored the parallel
@@ -712,8 +767,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
* which the parallel plan may have been chosen, but we do not allow the
* parallel inserts now.
*/
- if (!allow && tuple_cost_flags && queryDesc &&
- IsA(queryDesc->planstate, GatherState))
+ if (!allow && tuple_cost_flags && gather_exists)
{
/*
* If we have correctly ignored parallel tuple cost in planner while
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..0e6e2df9cc 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,44 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we set the flag to ignore parallel
+ * tuple cost by the Gather path in cost_gather if the SELECT is for
+ * CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * We set the flag for two cases when there is no parent path will
+ * be created(such as : limit,sort,distinct...):
+ * i) query_level is 1
+ * ii) query_level > 1 then set the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f1134711b0..7555cde61a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 0ae2f49e0c..4103ac65f0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Set to this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 84e8f981e1..09c9c63c2e 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -578,6 +578,727 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+----------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Append (actual rows=4 loops=4)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=5 loops=1)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: 1
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=1 loops=1)
+ -> Gather (actual rows=1 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ -> Parallel Append (actual rows=8 loops=2)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp2 (actual rows=5 loops=1)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;
+ QUERY PLAN
+-------------------------------------------------------------------------------
+ Append (actual rows=5 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp1 (actual rows=5 loops=1)
+ SubPlan 1
+ -> Limit (actual rows=1 loops=5)
+ -> Gather (actual rows=1 loops=5)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=20)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 where col1 = (select 1) union all select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp2 (actual rows=1 loops=1)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: 4
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Append (actual rows=15 loops=1)
+ -> Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+ SubPlan 1
+ -> Limit (actual rows=1 loops=5)
+ -> Gather (actual rows=1 loops=5)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=20)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Append (actual rows=4 loops=4)
+ -> Seq Scan on temp2 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 (actual rows=5 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=5 loops=1)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=0 loops=4)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: 1
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=0 loops=4)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 1
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=1 loops=1)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: 4
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: 1
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=0 loops=4)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=0 loops=1)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=1 loops=1)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: 3
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=1 loops=1)
+ -> Result (actual rows=0 loops=4)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=0 loops=4)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: 1
+ -> Result (actual rows=0 loops=1)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=0 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=0 loops=4)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: 1
+ -> Seq Scan on temp2 temp2_2 (actual rows=1 loops=1)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: 4
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------
+ HashAggregate (actual rows=5 loops=1)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=10 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=3 loops=1)
+ -> Append (actual rows=7 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=2 loops=1)
+ -> Gather (actual rows=2 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: 1
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=3 loops=1)
+ -> Append (actual rows=7 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=2 loops=1)
+ -> Gather (actual rows=2 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: 1
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=5 loops=1)
+ -> Append (actual rows=10 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;
+ QUERY PLAN
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=5 loops=1)
+ -> Append (actual rows=10 loops=1)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp1 (actual rows=1 loops=4)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=5 loops=1)
+ -> Gather (actual rows=5 loops=1)
+ Workers Planned: 3
+ Workers Launched: 3
+ -> Parallel Seq Scan on temp2 (actual rows=1 loops=4)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 7245cb97c6..937077d79c 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -217,6 +217,227 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as select * from temp1 where col1 = (select 1) union all select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 union
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
+explain (costs off, analyze on, timing off, summary off)
+create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Mon, Dec 21, 2020 at 8:16 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
The cfbost seems complains about the testcase:
Command exited with code 1 perl dumpregr.pl === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out C:/projects/postgresql/src/test/regress/results/write_parallel.out --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 +0000 +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out 2020-12-21 01:47:20.375514800 +0000 @@ -1204,7 +1204,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) @@ -1233,7 +1233,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows)
Thanks! Looks like the explain analyze test case outputs can be
unstable because we may not get the requested number of workers
always. The comment before explain_parallel_append function in
partition_prune.sql explains it well.
Solution is to have a function similar to explain_parallel_append, say
explain_parallel_inserts in write_parallel.sql and use that for all
explain analyze cases. This will make the results consistent.
Thoughts? If okay, I will update the test cases and post new patches.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 21, 2020 at 8:16 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
The cfbost seems complains about the testcase:
Command exited with code 1 perl dumpregr.pl === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out C:/projects/postgresql/src/test/regress/results/write_parallel.out --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 +0000 +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out 2020-12-21 01:47:20.375514800 +0000 @@ -1204,7 +1204,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) @@ -1233,7 +1233,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows)Thanks! Looks like the explain analyze test case outputs can be
unstable because we may not get the requested number of workers
always. The comment before explain_parallel_append function in
partition_prune.sql explains it well.Solution is to have a function similar to explain_parallel_append, say
explain_parallel_inserts in write_parallel.sql and use that for all
explain analyze cases. This will make the results consistent.
Thoughts? If okay, I will update the test cases and post new patches.
Attaching v14 patch set that has above changes. Please consider this
for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v14-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v14-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From e2c7dea3f1f0171b1d42b99f493860ccf1f6ccaa Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 14:35:24 +0530
Subject: [PATCH v14 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 ++-
src/backend/commands/createas.c | 303 ++++++++++++++++---------
src/backend/commands/explain.c | 31 +++
src/backend/executor/execParallel.c | 70 +++++-
src/backend/executor/nodeGather.c | 113 ++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 25 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 457 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..10f4f2b4d7 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +429,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
- Assert(into != NULL); /* else somebody forgot to set it */
-
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +653,47 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return;
+
+ if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
+ !queryDesc->planstate->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+ DestReceiver *dest = queryDesc->dest;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as into clause (for
+ * each worker to build separate dest receiver), object id (for each
+ * worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of
+ * explain plans.
+ */
+ queryDesc->planstate->plan->plan_rows = 0;
+ }
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fbd0bc5a81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..9ef33eee54 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len + 1);
+ memcpy(intoclause_space, intoclausestr, intoclause_len + 1);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ed4690305b 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern void ChooseParallelInsertsInCTAS(IntoClause *into,
+ QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v14-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v14-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 2484cfb3e57c79788448b04bfc1760b59fd20c4d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 15:19:23 +0530
Subject: [PATCH v14 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 35 +++++++++++++++++-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 ++++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 146 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 10f4f2b4d7..210927d4f4 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -344,7 +344,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc,
+ &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -659,8 +660,11 @@ intorel_destroy(DestReceiver *self)
* insertion is possible, if yes set the parallel insert state i.e. push down
* the dest receiver to the Gather nodes.
*/
-void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
+ bool allow = false;
+
if (!IS_CTAS(into))
return;
@@ -695,5 +699,32 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
* explain plans.
*/
queryDesc->planstate->plan->plan_rows = 0;
+
+ allow = true;
}
+
+ /*
+ * It should not happen that in cost_gather we have ignored the parallel
+ * tuple cost and now we are not allowing the parallel inserts. And also we
+ * might need assertion only if the top node is Gather. The main intention
+ * of assertion is to check if we enforced planner to ignore the parallel
+ * tuple cost (with the intention of choosing parallel inserts) due to
+ * which the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ */
+ if (!allow && tuple_cost_flags && queryDesc &&
+ IsA(queryDesc->planstate, GatherState))
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur.
+ * If it occurs, that means the planner may have chosen this parallel
+ * plan because of our enforcement to ignore the parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
+
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbd0bc5a81..efdb34d1f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -402,7 +402,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +497,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +563,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..f1134711b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not set ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ed4690305b..0ae2f49e0c 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Set to this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,5 +71,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern void ChooseParallelInsertsInCTAS(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v14-0003-Tests-For-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v14-0003-Tests-For-Parallel-Inserts-in-CTAS.patchDownload
From d5183d9f831ac5ddc5e3b8417e24e6c29db11011 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 22 Dec 2020 13:19:22 +0530
Subject: [PATCH v14 3/4] Tests For Parallel Inserts in CTAS
---
src/test/regress/expected/write_parallel.out | 577 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 216 +++++++
2 files changed, 793 insertions(+)
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..357fbbbe8d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,581 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..46b958014a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,220 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v14-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/x-patch; name=v14-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 7f3cc3f322a971cdd23221b45810ceb47e50be84 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 22 Dec 2020 13:30:22 +0530
Subject: [PATCH v14 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 104 ++-
src/backend/optimizer/path/allpaths.c | 39 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1073 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 210927d4f4..20d4f805d0 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -655,6 +655,78 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel
+ * worker insert the tuples, we must send information such as into
+ * clause (for each worker to build separate dest receiver), object
+ * id (for each worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in estimated row
+ * count of explain plans.
+ */
+ ps->plan->plan_rows = 0;
+ }
+ }
+
+ return parallel;
+}
+
/*
* ChooseParallelInsertsInCTAS --- determine whether or not parallel
* insertion is possible, if yes set the parallel insert state i.e. push down
@@ -664,6 +736,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
uint8 *tuple_cost_flags)
{
bool allow = false;
+ bool gather_exists = false;
if (!IS_CTAS(into))
return;
@@ -678,30 +751,12 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
return;
- if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
- !queryDesc->planstate->ps_ProjInfo)
- {
- GatherState *gstate = (GatherState *) queryDesc->planstate;
- DestReceiver *dest = queryDesc->dest;
-
- /*
- * For parallelizing inserts in CTAS i.e. making each parallel worker
- * insert the tuples, we must send information such as into clause (for
- * each worker to build separate dest receiver), object id (for each
- * worker to open the created table).
- */
- ((DR_intorel *) dest)->is_parallel = true;
- gstate->dest = dest;
-
- /*
- * Since there are no rows that are transferred from workers to Gather
- * node, so we set it to 0 to be visible in estimated row count of
- * explain plans.
- */
- queryDesc->planstate->plan->plan_rows = 0;
+ if (!queryDesc)
+ return;
- allow = true;
- }
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
* It should not happen that in cost_gather we have ignored the parallel
@@ -712,8 +767,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
* which the parallel plan may have been chosen, but we do not allow the
* parallel inserts now.
*/
- if (!allow && tuple_cost_flags && queryDesc &&
- IsA(queryDesc->planstate, GatherState))
+ if (!allow && tuple_cost_flags && gather_exists)
{
/*
* If we have correctly ignored parallel tuple cost in planner while
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 627d08b78a..00819eea3f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,44 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we set the flag to ignore parallel
+ * tuple cost by the Gather path in cost_gather if the SELECT is for
+ * CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * We set the flag for two cases when there is no parent path will
+ * be created(such as : limit,sort,distinct...):
+ * i) query_level is 1
+ * ii) query_level > 1 then set the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f1134711b0..7555cde61a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 0ae2f49e0c..4103ac65f0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Set to this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 357fbbbe8d..e0296e88a3 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -649,6 +649,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 46b958014a..4e03d0ab6b 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -251,6 +251,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.
Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)
Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)
You could change intoclause_len = strlen(intoclausestr) to
strlen(intoclausestr) + 1 and use intoclause_len in the remaining
places. We can avoid the +1 in the other places.
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
Can we use node->nworkers_launched == 0 in place of
node->need_to_scan_locally, that way the setting and resetting of
node->need_to_scan_locally can be removed. Unless need_to_scan_locally
is needed in any of the functions that gets called.
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have
started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its
share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)
I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1]https://commitfest.postgresql.org/31/2844/ has for a similar plan.
[1]: https://commitfest.postgresql.org/31/2844/
--
With Regards,
Amit Kapila.
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
You could change intoclause_len = strlen(intoclausestr) to strlen(intoclausestr) + 1 and use intoclause_len in the remaining places. We can avoid the +1 in the other places. + /* Estimate space for into clause for CTAS. */ + if (IS_CTAS(intoclause) && OidIsValid(objectid)) + { + intoclausestr = nodeToString(intoclause); + intoclause_len = strlen(intoclausestr); + shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1); + shm_toc_estimate_keys(&pcxt->estimator, 1); + }
Done.
Can we use node->nworkers_launched == 0 in place of node->need_to_scan_locally, that way the setting and resetting of node->need_to_scan_locally can be removed. Unless need_to_scan_locally is needed in any of the functions that gets called. + /* Enable leader to insert in case no parallel workers were launched. */ + if (node->nworkers_launched == 0) + node->need_to_scan_locally = true; + + /* + * By now, for parallel workers (if launched any), would have started their + * work i.e. insertion to target table. In case the leader is chosen to + * participate for parallel inserts in CTAS, then finish its share before + * going to wait for the parallel workers to finish. + */ + if (node->need_to_scan_locally) + {
need_to_scan_locally is being set in ExecGather() even if
nworkers_launched > 0 it can still be true, so I think we can not
remove need_to_scan_locally in ExecParallelInsertInCTAS.
Attaching v15 patch set for further review. Note that the change is
only in 0001 patch, other patches remain unchanged from v14.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v15-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v15-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From a73d0e0e4c300e28d3e5e659eacd6e23141012ec Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 24 Dec 2020 12:58:26 +0530
Subject: [PATCH v15 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 30 ++-
src/backend/commands/createas.c | 303 ++++++++++++++++---------
src/backend/commands/explain.c | 31 +++
src/backend/executor/execParallel.c | 70 +++++-
src/backend/executor/nodeGather.c | 113 ++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 25 ++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 457 insertions(+), 140 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..db6eedd635 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..10f4f2b4d7 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -430,121 +429,169 @@ static void
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
- IntoClause *into = myState->into;
- bool is_matview;
- List *attrList;
ObjectAddress intoRelationAddr;
Relation intoRelationDesc;
- ListCell *lc;
- int attnum;
- Assert(into != NULL); /* else somebody forgot to set it */
-
- /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
- is_matview = (into->viewQuery != NULL);
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
- /*
- * Build column definitions using "pre-cooked" type and collation info. If
- * a column name list was specified in CREATE TABLE AS, override the
- * column names derived from the query. (Too few column names are OK, too
- * many are not.)
- */
- attrList = NIL;
- lc = list_head(into->colNames);
- for (attnum = 0; attnum < typeinfo->natts; attnum++)
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
- Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
- ColumnDef *col;
- char *colname;
+ IntoClause *into = myState->into;
+ bool is_matview;
+ List *attrList;
+ ListCell *lc;
+ int attnum;
- if (lc)
+ Assert(into != NULL); /* else somebody forgot to set it */
+
+ /*
+ * This code supports both CREATE TABLE AS and CREATE MATERIALIZED
+ * VIEW.
+ */
+ is_matview = (into->viewQuery != NULL);
+
+ /*
+ * Build column definitions using "pre-cooked" type and collation info.
+ * If a column name list was specified in CREATE TABLE AS, override the
+ * column names derived from the query. (Too few column names are OK,
+ * too many are not.)
+ */
+ attrList = NIL;
+ lc = list_head(into->colNames);
+ for (attnum = 0; attnum < typeinfo->natts; attnum++)
{
- colname = strVal(lfirst(lc));
- lc = lnext(into->colNames, lc);
+ Form_pg_attribute attribute = TupleDescAttr(typeinfo, attnum);
+ ColumnDef *col;
+ char *colname;
+
+ if (lc)
+ {
+ colname = strVal(lfirst(lc));
+ lc = lnext(into->colNames, lc);
+ }
+ else
+ colname = NameStr(attribute->attname);
+
+ col = makeColumnDef(colname,
+ attribute->atttypid,
+ attribute->atttypmod,
+ attribute->attcollation);
+
+ /*
+ * It's possible that the column is of a collatable type but the
+ * collation could not be resolved, so double-check. (We must
+ * check this here because DefineRelation would adopt the type's
+ * default collation rather than complaining.)
+ */
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDETERMINATE_COLLATION),
+ errmsg("no collation was derived for column \"%s\" with collatable type %s",
+ col->colname,
+ format_type_be(col->typeName->typeOid)),
+ errhint("Use the COLLATE clause to set the collation explicitly.")));
+
+ attrList = lappend(attrList, col);
}
- else
- colname = NameStr(attribute->attname);
- col = makeColumnDef(colname,
- attribute->atttypid,
- attribute->atttypmod,
- attribute->attcollation);
+ if (lc != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("too many column names were specified")));
/*
- * It's possible that the column is of a collatable type but the
- * collation could not be resolved, so double-check. (We must check
- * this here because DefineRelation would adopt the type's default
- * collation rather than complaining.)
+ * Actually create the target table
*/
- if (!OidIsValid(col->collOid) &&
- type_is_collatable(col->typeName->typeOid))
- ereport(ERROR,
- (errcode(ERRCODE_INDETERMINATE_COLLATION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
- col->colname,
- format_type_be(col->typeName->typeOid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+ intoRelationAddr = create_ctas_internal(attrList, into);
- attrList = lappend(attrList, col);
- }
+ /*
+ * Finally we can open the target table
+ */
+ intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
- if (lc != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("too many column names were specified")));
+ /*
+ * Make sure the constructed table does not have RLS enabled.
+ *
+ * check_enable_rls() will ereport(ERROR) itself if the user has
+ * requested something invalid, and otherwise will return RLS_ENABLED
+ * if RLS should be enabled here. We don't actually support that
+ * currently, so throw our own ereport(ERROR) if that happens.
+ */
+ if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("policies not yet implemented for this command")));
- /*
- * Actually create the target table
- */
- intoRelationAddr = create_ctas_internal(attrList, into);
+ /*
+ * Tentatively mark the target as populated, if it's a matview and
+ * we're going to fill it; otherwise, no change needed.
+ */
+ if (is_matview && !into->skipData)
+ SetMatViewPopulatedState(intoRelationDesc, true);
- /*
- * Finally we can open the target table
- */
- intoRelationDesc = table_open(intoRelationAddr.objectId, AccessExclusiveLock);
+ /*
+ * Fill private fields of myState for use by later routines
+ */
+ myState->rel = intoRelationDesc;
+ myState->reladdr = intoRelationAddr;
+ myState->output_cid = GetCurrentCommandId(true);
+ myState->ti_options = TABLE_INSERT_SKIP_FSM;
- /*
- * Make sure the constructed table does not have RLS enabled.
- *
- * check_enable_rls() will ereport(ERROR) itself if the user has requested
- * something invalid, and otherwise will return RLS_ENABLED if RLS should
- * be enabled here. We don't actually support that currently, so throw
- * our own ereport(ERROR) if that happens.
- */
- if (check_enable_rls(intoRelationAddr.objectId, InvalidOid, false) == RLS_ENABLED)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("policies not yet implemented for this command")));
+ /*
+ * If WITH NO DATA is specified, there is no need to set up the state
+ * for bulk inserts as there are no tuples to insert.
+ */
+ if (!into->skipData)
+ myState->bistate = GetBulkInsertState();
+ else
+ myState->bistate = NULL;
- /*
- * Tentatively mark the target as populated, if it's a matview and we're
- * going to fill it; otherwise, no change needed.
- */
- if (is_matview && !into->skipData)
- SetMatViewPopulatedState(intoRelationDesc, true);
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
- /*
- * Fill private fields of myState for use by later routines
- */
- myState->rel = intoRelationDesc;
- myState->reladdr = intoRelationAddr;
- myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ /*
+ * We don't need to skip contacting FSM while inserting tuples
+ * for parallel mode, while extending the relations, workers
+ * instead of blocking on a page while another worker is inserting,
+ * can check the FSM for another page that can accommodate the
+ * tuples. This results in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
- /*
- * If WITH NO DATA is specified, there is no need to set up the state for
- * bulk inserts as there are no tuples to insert.
- */
- if (!into->skipData)
- myState->bistate = GetBulkInsertState();
- else
- myState->bistate = NULL;
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
- /*
- * Valid smgr_targblock implies something already wrote to the relation.
- * This may be harmless, but this function hasn't planned for it.
- */
- Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ /*
+ * Valid smgr_targblock implies something already wrote to the
+ * relation. This may be harmless, but this function hasn't planned for
+ * it.
+ */
+ Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
+ }
}
/*
@@ -606,3 +653,47 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return;
+
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return;
+
+ if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
+ !queryDesc->planstate->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+ DestReceiver *dest = queryDesc->dest;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as into clause (for
+ * each worker to build separate dest receiver), object id (for each
+ * worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of
+ * explain plans.
+ */
+ queryDesc->planstate->plan->plan_rows = 0;
+ }
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..fbd0bc5a81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ ChooseParallelInsertsInCTAS(into, queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..5aa4bde54d 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len);
+ memcpy(intoclause_space, intoclausestr, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..e7c588c66a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,17 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ed4690305b 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used for table open by parallel worker. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern void ChooseParallelInsertsInCTAS(IntoClause *into,
+ QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..e9c4442c22 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Parallel inserts in CTAS related info is specified below. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v15-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v15-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 2484cfb3e57c79788448b04bfc1760b59fd20c4d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 21 Dec 2020 15:19:23 +0530
Subject: [PATCH v15 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 35 +++++++++++++++++-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 ++++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 146 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 10f4f2b4d7..210927d4f4 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -344,7 +344,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc,
+ &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -659,8 +660,11 @@ intorel_destroy(DestReceiver *self)
* insertion is possible, if yes set the parallel insert state i.e. push down
* the dest receiver to the Gather nodes.
*/
-void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
+ bool allow = false;
+
if (!IS_CTAS(into))
return;
@@ -695,5 +699,32 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
* explain plans.
*/
queryDesc->planstate->plan->plan_rows = 0;
+
+ allow = true;
}
+
+ /*
+ * It should not happen that in cost_gather we have ignored the parallel
+ * tuple cost and now we are not allowing the parallel inserts. And also we
+ * might need assertion only if the top node is Gather. The main intention
+ * of assertion is to check if we enforced planner to ignore the parallel
+ * tuple cost (with the intention of choosing parallel inserts) due to
+ * which the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ */
+ if (!allow && tuple_cost_flags && queryDesc &&
+ IsA(queryDesc->planstate, GatherState))
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur.
+ * If it occurs, that means the planner may have chosen this parallel
+ * plan because of our enforcement to ignore the parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
+
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbd0bc5a81..efdb34d1f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -402,7 +402,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +497,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +563,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* target table. We need plan state to be initialized by the executor to
* decide whether to allow parallel inserts or not.
*/
- ChooseParallelInsertsInCTAS(into, queryDesc);
+ ChooseParallelInsertsInCTAS(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..f1134711b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not set ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ed4690305b..0ae2f49e0c 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Set to this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,5 +71,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern void ChooseParallelInsertsInCTAS(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v15-0003-Tests-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v15-0003-Tests-For-Parallel-Inserts-in-CTAS.patchDownload
From d5183d9f831ac5ddc5e3b8417e24e6c29db11011 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 22 Dec 2020 13:19:22 +0530
Subject: [PATCH v15 3/4] Tests For Parallel Inserts in CTAS
---
src/test/regress/expected/write_parallel.out | 577 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 216 +++++++
2 files changed, 793 insertions(+)
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..357fbbbe8d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,581 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..46b958014a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,220 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v15-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v15-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 7f3cc3f322a971cdd23221b45810ceb47e50be84 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 22 Dec 2020 13:30:22 +0530
Subject: [PATCH v15 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 104 ++-
src/backend/optimizer/path/allpaths.c | 39 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1073 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 210927d4f4..20d4f805d0 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -655,6 +655,78 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS i.e. making each parallel
+ * worker insert the tuples, we must send information such as into
+ * clause (for each worker to build separate dest receiver), object
+ * id (for each worker to open the created table).
+ */
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in estimated row
+ * count of explain plans.
+ */
+ ps->plan->plan_rows = 0;
+ }
+ }
+
+ return parallel;
+}
+
/*
* ChooseParallelInsertsInCTAS --- determine whether or not parallel
* insertion is possible, if yes set the parallel insert state i.e. push down
@@ -664,6 +736,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
uint8 *tuple_cost_flags)
{
bool allow = false;
+ bool gather_exists = false;
if (!IS_CTAS(into))
return;
@@ -678,30 +751,12 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
return;
- if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
- !queryDesc->planstate->ps_ProjInfo)
- {
- GatherState *gstate = (GatherState *) queryDesc->planstate;
- DestReceiver *dest = queryDesc->dest;
-
- /*
- * For parallelizing inserts in CTAS i.e. making each parallel worker
- * insert the tuples, we must send information such as into clause (for
- * each worker to build separate dest receiver), object id (for each
- * worker to open the created table).
- */
- ((DR_intorel *) dest)->is_parallel = true;
- gstate->dest = dest;
-
- /*
- * Since there are no rows that are transferred from workers to Gather
- * node, so we set it to 0 to be visible in estimated row count of
- * explain plans.
- */
- queryDesc->planstate->plan->plan_rows = 0;
+ if (!queryDesc)
+ return;
- allow = true;
- }
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
* It should not happen that in cost_gather we have ignored the parallel
@@ -712,8 +767,7 @@ void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
* which the parallel plan may have been chosen, but we do not allow the
* parallel inserts now.
*/
- if (!allow && tuple_cost_flags && queryDesc &&
- IsA(queryDesc->planstate, GatherState))
+ if (!allow && tuple_cost_flags && gather_exists)
{
/*
* If we have correctly ignored parallel tuple cost in planner while
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 627d08b78a..00819eea3f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,44 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we set the flag to ignore parallel
+ * tuple cost by the Gather path in cost_gather if the SELECT is for
+ * CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * We set the flag for two cases when there is no parent path will
+ * be created(such as : limit,sort,distinct...):
+ * i) query_level is 1
+ * ii) query_level > 1 then set the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f1134711b0..7555cde61a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 0ae2f49e0c..4103ac65f0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Set to this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 357fbbbe8d..e0296e88a3 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -649,6 +649,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 46b958014a..4e03d0ab6b 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -251,6 +251,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.
Also another thing that I felt was that actually the Gather nodes will
actually do the insert operation, the Create table will be done earlier
itself. Should we change Create table to Insert table something like below:
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> *Insert table2 **(instead of Create table2)*
-> Parallel Seq Scan on table1 (cost=0.00..9.17 rows=417 width=4)
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create table will be done earlier itself. Should we change Create table to Insert table something like below:
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Insert table2 (instead of Create table2)
-> Parallel Seq Scan on table1 (cost=0.00..9.17 rows=417 width=4)
IMO, showing Insert under Gather makes sense if the query is INSERT
INTO SELECT as it's in the other patch [1]https://commitfest.postgresql.org/31/2844/. Since here it is a CTAS
query, so having Create under Gather looks fine to me. This way we can
also distinguish the EXPLAINs of parallel inserts in INSERT INTO
SELECT and CTAS.
And also, some might wonder that Create under Gather means that each
parallel worker is creating the table, it's actually not the creation
of the table that's parallelized but it's insertion. If required, we
can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
added docs related to allowing parallel inserts in CTAS. Shall I add a
para saying when parallel inserts can be picked and how the sample
EXPLAIN looks? Thoughts?
[1]: https://commitfest.postgresql.org/31/2844/
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create table will be done earlier itself. Should we change Create table to Insert table something like below:
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Insert table2 (instead of Create table2)
-> Parallel Seq Scan on table1 (cost=0.00..9.17 rows=417 width=4)IMO, showing Insert under Gather makes sense if the query is INSERT
INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
query, so having Create under Gather looks fine to me. This way we can
also distinguish the EXPLAINs of parallel inserts in INSERT INTO
SELECT and CTAS.
I don't think that is a problem because now also if we EXPLAIN CTAS it
will appear like we are executing the select query because that is
what we are planning for only the select part. So now if we are
including the INSERT in the planning and pushing the insert under the
gather then it will make more sense to show INSERT instead of showing
CREATE. Let's see what others think.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create table will be done earlier itself. Should we change Create table to Insert table something like below:
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Insert table2 (instead of Create table2)
-> Parallel Seq Scan on table1 (cost=0.00..9.17 rows=417 width=4)IMO, showing Insert under Gather makes sense if the query is INSERT
INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
query, so having Create under Gather looks fine to me. This way we can
also distinguish the EXPLAINs of parallel inserts in INSERT INTO
SELECT and CTAS.
Right, IIRC, we have done the way it is in the patch for convenience
and to move forward with it and come back to it later once all other
parts of the patch are good.
And also, some might wonder that Create under Gather means that each
parallel worker is creating the table, it's actually not the creation
of the table that's parallelized but it's insertion. If required, we
can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
added docs related to allowing parallel inserts in CTAS. Shall I add a
para saying when parallel inserts can be picked and how the sample
EXPLAIN looks? Thoughts?
Yeah, I don't see any problem with it, and maybe we can move Explain
related code to a separate patch. The reason is we don't display DDL
part without parallelism and this might need a separate discussion.
--
With Regards,
Amit Kapila.
On Fri, Dec 25, 2020 at 10:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
Attaching v14 patch set that has above changes. Please consider this
for further review.Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create table will be done earlier itself. Should we change Create table to Insert table something like below:
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Insert table2 (instead of Create table2)
-> Parallel Seq Scan on table1 (cost=0.00..9.17 rows=417 width=4)IMO, showing Insert under Gather makes sense if the query is INSERT
INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
query, so having Create under Gather looks fine to me. This way we can
also distinguish the EXPLAINs of parallel inserts in INSERT INTO
SELECT and CTAS.Right, IIRC, we have done the way it is in the patch for convenience
and to move forward with it and come back to it later once all other
parts of the patch are good.And also, some might wonder that Create under Gather means that each
parallel worker is creating the table, it's actually not the creation
of the table that's parallelized but it's insertion. If required, we
can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
added docs related to allowing parallel inserts in CTAS. Shall I add a
para saying when parallel inserts can be picked and how the sample
EXPLAIN looks? Thoughts?Yeah, I don't see any problem with it, and maybe we can move Explain
related code to a separate patch. The reason is we don't display DDL
part without parallelism and this might need a separate discussion.
This makes sense to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
You could change intoclause_len = strlen(intoclausestr) to strlen(intoclausestr) + 1 and use intoclause_len in the remaining places. We can avoid the +1 in the other places. + /* Estimate space for into clause for CTAS. */ + if (IS_CTAS(intoclause) && OidIsValid(objectid)) + { + intoclausestr = nodeToString(intoclause); + intoclause_len = strlen(intoclausestr); + shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1); + shm_toc_estimate_keys(&pcxt->estimator, 1); + }Done.
Can we use node->nworkers_launched == 0 in place of node->need_to_scan_locally, that way the setting and resetting of node->need_to_scan_locally can be removed. Unless need_to_scan_locally is needed in any of the functions that gets called. + /* Enable leader to insert in case no parallel workers were launched. */ + if (node->nworkers_launched == 0) + node->need_to_scan_locally = true; + + /* + * By now, for parallel workers (if launched any), would have started their + * work i.e. insertion to target table. In case the leader is chosen to + * participate for parallel inserts in CTAS, then finish its share before + * going to wait for the parallel workers to finish. + */ + if (node->need_to_scan_locally) + {need_to_scan_locally is being set in ExecGather() even if
nworkers_launched > 0 it can still be true, so I think we can not
remove need_to_scan_locally in ExecParallelInsertInCTAS.Attaching v15 patch set for further review. Note that the change is
only in 0001 patch, other patches remain unchanged from v14.
I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.
1.
@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
/* this is global to a transaction, not subtransaction-local */
if (used)
{
- /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
- Assert(!IsParallelWorker());
+ /*
+ * This is a temporary hack for all common parallel insert cases i.e.
+ * insert into, ctas, copy from. To be changed later. In a parallel
+ * worker, set currentCommandIdUsed to true only if it was not set to
+ * true at the start of the parallel operation (by way of
+ * SetCurrentCommandIdUsedForWorker()). We have to do this because
+ * GetCurrentCommandId(true) may be called from anywhere, especially
+ * for parallel inserts, within parallel worker.
+ */
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
Why is this temporary hack? and what is the plan for removing this hack?
2.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ if (!IS_CTAS(into))
+ return;
When will this hit? The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return, can you add the comment that when do you
expect this case?
Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
3.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
Push down to the Gather nodes? I think the right statement will be
push down below the Gather node.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+ }
+ else
{
non-parallel worker code
}
}
I think instead of moving all the code related to non-parallel worker
in the else we can do better. This
will avoid unnecessary code movement.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;
-- Comment ->in parallel worker we don't need to crease dest recv blah blah
+ if (myState->is_parallel_worker)
{
--parallel worker handling--
return;
}
--non-parallel worker code stay right there, instead of moving to else
5.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not. I think name/comment
should be better for this
6.
/*
+ * For parallelizing inserts in CTAS i.e. making each parallel worker
+ * insert the tuples, we must send information such as into clause (for
+ * each worker to build separate dest receiver), object id (for each
+ * worker to open the created table).
Comment is saying we need to pass object id but the code under this
comment is not doing so.
7.
+ /*
+ * Since there are no rows that are transferred from workers to Gather
+ * node, so we set it to 0 to be visible in estimated row count of
+ * explain plans.
+ */
+ queryDesc->planstate->plan->plan_rows = 0;
This seems a bit hackies Why it is done after the planning, I mean
plan must know that it is returning a 0 rows?
8.
+ char *intoclause_space = shm_toc_allocate(pcxt->toc,
+ intoclause_len);
+ memcpy(intoclause_space, intoclausestr, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
One blank line between variable declaration and next code segment,
take care at other places as well.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
You could change intoclause_len = strlen(intoclausestr) to strlen(intoclausestr) + 1 and use intoclause_len in the remaining places. We can avoid the +1 in the other places. + /* Estimate space for into clause for CTAS. */ + if (IS_CTAS(intoclause) && OidIsValid(objectid)) + { + intoclausestr = nodeToString(intoclause); + intoclause_len = strlen(intoclausestr); + shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1); + shm_toc_estimate_keys(&pcxt->estimator, 1); + }Done.
Can we use node->nworkers_launched == 0 in place of node->need_to_scan_locally, that way the setting and resetting of node->need_to_scan_locally can be removed. Unless need_to_scan_locally is needed in any of the functions that gets called. + /* Enable leader to insert in case no parallel workers were launched. */ + if (node->nworkers_launched == 0) + node->need_to_scan_locally = true; + + /* + * By now, for parallel workers (if launched any), would have started their + * work i.e. insertion to target table. In case the leader is chosen to + * participate for parallel inserts in CTAS, then finish its share before + * going to wait for the parallel workers to finish. + */ + if (node->need_to_scan_locally) + {need_to_scan_locally is being set in ExecGather() even if
nworkers_launched > 0 it can still be true, so I think we can not
remove need_to_scan_locally in ExecParallelInsertInCTAS.Attaching v15 patch set for further review. Note that the change is
only in 0001 patch, other patches remain unchanged from v14.
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off,
timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
The above function is same as function present in partition_prune.sql:
create function explain_parallel_append(text) returns setof text
language plpgsql as
$$
declare
ln text;
begin
for ln in
execute format('explain (analyze, costs off, summary off,
timing off) %s',
$1)
loop
ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
return next ln;
end loop;
end;
$$;
If possible try to make a common function for both and use.
+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
Here OidIsValid(objectid) check is not required intoclausestr will be
set only if OidIsValid.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.
Thanks a lot.
1.
Why is this temporary hack? and what is the plan for removing this hack?
The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.
2. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{ + if (!IS_CTAS(into)) + return;When will this hit? The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return, can you add the comment that when do you
expect this case?
Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
ChooseParallelInsertsInCTAS()
Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
Ah, missed that. Modified now.
3. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */Push down to the Gather nodes? I think the right statement will be
push down below the Gather node.
Modified.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;-- Comment ->in parallel worker we don't need to crease dest recv blah blah + if (myState->is_parallel_worker) { --parallel worker handling-- return; }--non-parallel worker code stay right there, instead of moving to else
Done.
5. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not. I think name/comment
should be better for this
Yeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.
6. /* + * For parallelizing inserts in CTAS i.e. making each parallel worker + * insert the tuples, we must send information such as into clause (for + * each worker to build separate dest receiver), object id (for each + * worker to open the created table).Comment is saying we need to pass object id but the code under this
comment is not doing so.
Improved the comment.
7. + /* + * Since there are no rows that are transferred from workers to Gather + * node, so we set it to 0 to be visible in estimated row count of + * explain plans. + */ + queryDesc->planstate->plan->plan_rows = 0;This seems a bit hackies Why it is done after the planning, I mean
plan must know that it is returning a 0 rows?
This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
if (es->costs)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f width=%d)",
plan->startup_cost, plan->total_cost,
plan->plan_rows, plan->plan_width);
Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?
I removed it in v6 patch set.
8. + char *intoclause_space = shm_toc_allocate(pcxt->toc, + intoclause_len); + memcpy(intoclause_space, intoclausestr, intoclause_len); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);One blank line between variable declaration and next code segment,
take care at other places as well.
Done.
I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]/messages/by-id/CAA4eK1JqwXGYoGa1+3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA@mail.gmail.com. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.
[1]: /messages/by-id/CAA4eK1JqwXGYoGa1+3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v16-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v16-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 82dceab0abeed669142b90d75c9c120d56e5367e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 27 Dec 2020 13:24:57 +0530
Subject: [PATCH v16 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 ---
src/backend/access/transam/xact.c | 28 ++++--
src/backend/commands/createas.c | 113 ++++++++++++++++++++++---
src/backend/commands/explain.c | 32 +++++++
src/backend/executor/execParallel.c | 70 +++++++++++++--
src/backend/executor/nodeGather.c | 112 ++++++++++++++++++++++--
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 25 ++++++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 361 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..4e5366fe78 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..62b83425fe 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each
+ * parallel worker insert the tuples that are resulted in its execution
+ * into the target table. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ TryParallelizingInsertsInCTAS(into, queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -438,6 +437,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest recevier and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -540,6 +568,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
@@ -606,3 +655,43 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * TryParallelizingInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible. If yes set the parallel insert state such as a flag
+ * in the dest receiver and store the dest receiver reference in the Gather
+ * node so that the required information will be sent to workers.
+ */
+void
+TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+ /*
+ * Do not allow parallel inserts if the table is temporary. As the
+ * temporary tables are backend local, workers can not know about them.
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return;
+
+ if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
+ !queryDesc->planstate->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) queryDesc->planstate;
+ DestReceiver *dest = queryDesc->dest;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such as
+ * into clause (to build separate dest receiver), object id (to open
+ * the created table) to each workers. Since this information is
+ * available in the CTAS dest receiver, store a reference to it in the
+ * Gather state so that it will be used in ExecInitParallelPlan to pick
+ * the required information.
+ */
+ gstate->dest = dest;
+ }
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..912e2d5b89 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If SELECT part of the CTAS is parallelizable, then make each parallel
+ * worker insert the tuples that are resulted in its execution into the
+ * target table. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (into)
+ TryParallelizingInsertsInCTAS(into, queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1784,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..270f8c13d7 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr)
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclausestr, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..eeb51a9f43 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +225,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +234,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +253,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +274,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +295,16 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 && !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..fc4a64989f 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern void TryParallelizingInsertsInCTAS(IntoClause *into,
+ QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..a5b281f783 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From e580926ce0756b425163d165100f4358e8abbac5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 26 Dec 2020 15:19:45 +0530
Subject: [PATCH v16 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 34 ++++++++++++++++-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 ++++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 145 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 27584c6be0..aaed0d8e2b 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -344,7 +344,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* into the target table. We need plan state to be initialized by the
* executor to decide whether to allow parallel inserts or not.
*/
- TryParallelizingInsertsInCTAS(into, queryDesc);
+ TryParallelizingInsertsInCTAS(into, queryDesc,
+ &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -663,8 +664,11 @@ intorel_destroy(DestReceiver *self)
* node so that the required information will be sent to workers.
*/
void
-TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags)
{
+ bool allow = false;
+
/*
* Do not allow parallel inserts if the table is temporary. As the
* temporary tables are backend local, workers can not know about them.
@@ -693,5 +697,31 @@ TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
* the required information.
*/
gstate->dest = dest;
+ allow = true;
+ }
+
+ /*
+ * It should not happen that in cost_gather we have ignored the parallel
+ * tuple cost and now we are not allowing the parallel inserts. And also we
+ * might need assertion only if the top node is Gather. The main intention
+ * of assertion is to check if we enforced planner to ignore the parallel
+ * tuple cost (with the intention of choosing parallel inserts) due to
+ * which the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ */
+ if (!allow && tuple_cost_flags && queryDesc &&
+ IsA(queryDesc->planstate, GatherState))
+ {
+ /*
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur.
+ * If it occurs, that means the planner may have chosen this parallel
+ * plan because of our enforcement to ignore the parallel tuple cost.
+ */
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
}
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
+
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 912e2d5b89..90c614eabc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -402,7 +402,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +497,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* decide whether to allow parallel inserts or not.
*/
if (into)
- TryParallelizingInsertsInCTAS(into, queryDesc);
+ TryParallelizingInsertsInCTAS(into, queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..800f25903d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have set ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..f1134711b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we set a flag to ignore parallel tuple cost by
+ * the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not set ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Set a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we set it but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ */
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index fc4a64989f..919148b10a 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* set to this before planning */
+ /*
+ * Set to this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Set to this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -53,5 +71,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern void TryParallelizingInsertsInCTAS(IntoClause *into,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v16-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v16-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From e0b2344a31ac002cf8cf53d3f9ba2c977a82e39c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 26 Dec 2020 15:56:01 +0530
Subject: [PATCH v16 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 577 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 216 +++++++
3 files changed, 819 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..357fbbbe8d 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,581 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..46b958014a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,220 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select now(), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v16-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v16-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 1f9cad8986141a9a5efb81e448ed088d3bf4efc7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 26 Dec 2020 15:30:15 +0530
Subject: [PATCH v16 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 99 ++-
src/backend/optimizer/path/allpaths.c | 39 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1071 insertions(+), 25 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index aaed0d8e2b..25da4e6035 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -657,6 +657,76 @@ intorel_destroy(DestReceiver *self)
pfree(self);
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ }
+
+ return parallel;
+}
+
/*
* TryParallelizingInsertsInCTAS --- determine whether or not parallel
* insertion is possible. If yes set the parallel insert state such as a flag
@@ -668,6 +738,7 @@ TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
uint8 *tuple_cost_flags)
{
bool allow = false;
+ bool gather_exists = false;
/*
* Do not allow parallel inserts if the table is temporary. As the
@@ -679,26 +750,12 @@ TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
if (!into->rel || into->rel->relpersistence == RELPERSISTENCE_TEMP)
return;
- if (queryDesc && IsA(queryDesc->planstate, GatherState) &&
- !queryDesc->planstate->ps_ProjInfo)
- {
- GatherState *gstate = (GatherState *) queryDesc->planstate;
- DestReceiver *dest = queryDesc->dest;
+ if (!queryDesc)
+ return;
- /* Okay to parallelize inserts, so mark it. */
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts in CTAS we must send information such as
- * into clause (to build separate dest receiver), object id (to open
- * the created table) to each workers. Since this information is
- * available in the CTAS dest receiver, store a reference to it in the
- * Gather state so that it will be used in ExecInitParallelPlan to pick
- * the required information.
- */
- gstate->dest = dest;
- allow = true;
- }
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
* It should not happen that in cost_gather we have ignored the parallel
@@ -709,8 +766,7 @@ TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
* which the parallel plan may have been chosen, but we do not allow the
* parallel inserts now.
*/
- if (!allow && tuple_cost_flags && queryDesc &&
- IsA(queryDesc->planstate, GatherState))
+ if (!allow && tuple_cost_flags && gather_exists)
{
/*
* If we have correctly ignored parallel tuple cost in planner while
@@ -723,5 +779,4 @@ TryParallelizingInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc,
if (tuple_cost_flags)
*tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
-
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 627d08b78a..00819eea3f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,44 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we set the flag to ignore parallel
+ * tuple cost by the Gather path in cost_gather if the SELECT is for
+ * CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * We set the flag for two cases when there is no parent path will
+ * be created(such as : limit,sort,distinct...):
+ * i) query_level is 1
+ * ii) query_level > 1 then set the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f1134711b0..7555cde61a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 919148b10a..8dbe537693 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Set to this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 357fbbbe8d..e0296e88a3 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -649,6 +649,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 46b958014a..4e03d0ab6b 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -251,6 +251,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Sat, Dec 26, 2020 at 9:20 PM vignesh C <vignesh21@gmail.com> wrote:
+-- parallel inserts must occur +select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); +select count(*) from parallel_write; +drop table parallel_write;We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"+-- parallel inserts must occur +select explain_pictas( +'create unlogged table parallel_write as select length(stringu1) from tenk1;'); +select count(*) from parallel_write; +drop table parallel_write;We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.
I think the existing comments look fine. The info like table type and
the Query CTAS or CMV is visible by looking at the test case. What I
wanted from the comments is whether we support parallel inserts or not
and if not why so that it will be easy to read. I tried to keep it as
succinctly as possible.
If possible try to make a common function for both and use.
Yes you are right. The function explain_pictas is the same as
explain_parallel_append from partition_prune.sql. It's a test
function, and I also see that we have serial_schedule and
parallel_schedule which means that these sql files can run in any
order. I'm not quite sure whether we can have it in a common test sql
file and use it across other tests sql files. AFAICS, I didn't find
any function being used in such a manner. Thoughts?
+ if (intoclausestr && OidIsValid(objectid)) + fpes->objectid = objectid; + else + fpes->objectid = InvalidOid; Here OidIsValid(objectid) check is not required intoclausestr will be set only if OidIsValid.
Removed the OidIsValid check in the latest v16 patch set posted
upthread. Please have a look.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:
+ if (ignore &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the
above if since when ignore_parallel_tuple_cost returns
true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.
+ * In this function we only care Append and Gather nodes.
'care' -> 'care about'
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
It seems the loop termination condition can include parallel since we can
come out of the loop once parallel is true.
+ if (!allow && tuple_cost_flags && gather_exists)
As the above code shows, gather_exists is only checked when allow is false.
+ * We set the flag for two cases when there is no parent path
will
+ * be created(such as : limit,sort,distinct...):
Please correct the grammar : there are two verbs following 'when'
For set_append_rel_size:
+ {
+ root->parse->CTASParallelInsInfo |=
+
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+
~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
In the if block for childrel->rtekind ==
RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it
cleared immediately after ?
+ /* Set to this in case tuple cost needs to be ignored for Append cases.
*/
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn
on' or similar term in the comment. Because 'set to' normally means
assignment.
Cheers
On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.Thanks a lot.
1.
Why is this temporary hack? and what is the plan for removing this hack?The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.2. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e.push down
+ * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{ + if (!IS_CTAS(into)) + return;When will this hit? The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return, can you add the comment that when do you
expect this case?Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
ChooseParallelInsertsInCTAS()Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)Ah, missed that. Modified now.
3. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e.push down
+ * the dest receiver to the Gather nodes. + */Push down to the Gather nodes? I think the right statement will be
push down below the Gather node.Modified.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;-- Comment ->in parallel worker we don't need to crease dest recv
blah blah
+ if (myState->is_parallel_worker)
{
--parallel worker handling--
return;
}--non-parallel worker code stay right there, instead of moving to
else
Done.
5. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e.push down
+ * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not. I think name/comment
should be better for thisYeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.6.
/*
+ * For parallelizing inserts in CTAS i.e. making each parallelworker
+ * insert the tuples, we must send information such as into
clause (for
+ * each worker to build separate dest receiver), object id (for
each
+ * worker to open the created table).
Comment is saying we need to pass object id but the code under this
comment is not doing so.Improved the comment.
7. + /* + * Since there are no rows that are transferred from workers toGather
+ * node, so we set it to 0 to be visible in estimated row count
of
+ * explain plans. + */ + queryDesc->planstate->plan->plan_rows = 0;This seems a bit hackies Why it is done after the planning, I mean
plan must know that it is returning a 0 rows?This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
if (es->costs)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f
width=%d)",
plan->startup_cost, plan->total_cost,
plan->plan_rows, plan->plan_width);Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?I removed it in v6 patch set.
8. + char *intoclause_space = shm_toc_allocate(pcxt->toc, + intoclause_len); + memcpy(intoclause_space, intoclausestr, intoclause_len); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE,intoclause_space);
One blank line between variable declaration and next code segment,
take care at other places as well.Done.
I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.[1] -
/messages/by-id/CAA4eK1JqwXGYoGa1+3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA@mail.gmail.comWith Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.Thanks a lot.
1.
Why is this temporary hack? and what is the plan for removing this hack?The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.2. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{ + if (!IS_CTAS(into)) + return;When will this hit? The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return, can you add the comment that when do you
expect this case?Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
ChooseParallelInsertsInCTAS()Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)Ah, missed that. Modified now.
3. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */Push down to the Gather nodes? I think the right statement will be
push down below the Gather node.Modified.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;-- Comment ->in parallel worker we don't need to crease dest recv blah blah + if (myState->is_parallel_worker) { --parallel worker handling-- return; }--non-parallel worker code stay right there, instead of moving to else
Done.
5. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not. I think name/comment
should be better for thisYeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.6. /* + * For parallelizing inserts in CTAS i.e. making each parallel worker + * insert the tuples, we must send information such as into clause (for + * each worker to build separate dest receiver), object id (for each + * worker to open the created table).Comment is saying we need to pass object id but the code under this
comment is not doing so.Improved the comment.
7. + /* + * Since there are no rows that are transferred from workers to Gather + * node, so we set it to 0 to be visible in estimated row count of + * explain plans. + */ + queryDesc->planstate->plan->plan_rows = 0;This seems a bit hackies Why it is done after the planning, I mean
plan must know that it is returning a 0 rows?This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
if (es->costs)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f width=%d)",
plan->startup_cost, plan->total_cost,
plan->plan_rows, plan->plan_width);Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?I removed it in v6 patch set.
8. + char *intoclause_space = shm_toc_allocate(pcxt->toc, + intoclause_len); + memcpy(intoclause_space, intoclausestr, intoclause_len); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);One blank line between variable declaration and next code segment,
take care at other places as well.Done.
I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.[1] - /messages/by-id/CAA4eK1JqwXGYoGa1+3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA@mail.gmail.com
Thanks for working on this, I will have a look at the updated patches soon.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Sat, Dec 26, 2020 at 9:20 PM vignesh C <vignesh21@gmail.com> wrote:
+-- parallel inserts must occur +select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); +select count(*) from parallel_write; +drop table parallel_write;We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"+-- parallel inserts must occur +select explain_pictas( +'create unlogged table parallel_write as select length(stringu1) from
tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.I think the existing comments look fine. The info like table type and
the Query CTAS or CMV is visible by looking at the test case. What I
wanted from the comments is whether we support parallel inserts or not
and if not why so that it will be easy to read. I tried to keep it as
succinctly as possible.
I saw few inconsistencies in the patch:
*+-- parallel inserts must occur*+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
*+-- parallel inserts must not occur as the table is temporary*+select
explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from
tenk1;');
+ explain_pictas
*+-- parallel inserts must occur, as there is init plan that gets executed
by+-- each parallel worker*
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
*+-- the top node is Gather under which merge join happens, so parallel
inserts+-- must occur*
+set enable_nestloop to off;
+set enable_mergejoin to on;
*+-- parallel hash join happens under Gather node, so parallel inserts must
occur*+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
Test comments are detailed in a few cases and in few others it is not
detailed for similar kinds of parallelism selected tests. I felt we could
make the test comments consistent across the file.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Thanks for working on this, I will have a look at the updated patches soon.
Attaching v17 patch set after addressing comments raised in other
threads. Please consider this patch set for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v17-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v17-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From d36c81fb6e13a8ec3c36fd6cd3bb096d783715e7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 14:30:45 +0530
Subject: [PATCH v17 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 --
src/backend/access/transam/xact.c | 28 ++++-
src/backend/commands/createas.c | 137 ++++++++++++++++++++++---
src/backend/commands/explain.c | 31 ++++++
src/backend/executor/execParallel.c | 70 ++++++++++++-
src/backend/executor/nodeGather.c | 111 ++++++++++++++++++--
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 26 +++++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 384 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..4e5366fe78 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..2beccf60df 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -438,6 +437,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -540,6 +568,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
@@ -606,3 +655,67 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertionAllowedInCTAS --- check if parallel insertion is allowed
+ *
+ * Do not allow parallel inserts if the table is temporary. As the temporary
+ * tables are backend local, workers can not know about them. Currently, CTAS
+ * supports creation of normal(logged), temporary and unlogged tables. It does
+ * not support foreign or partition table creation. Hence the check for
+ * temporary table is enough here.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+bool
+IsParallelInsertionAllowedInCTAS(IntoClause *into)
+{
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!IS_CTAS(into))
+ return false;
+
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the parallel insert state
+ *
+ * See if the upper node is Gather and it doesn't have any projections, then
+ * set the parallel insert state such as a flag in the dest receiver and also
+ * store the dest receiver reference in the Gather node so that the required
+ * information will be sent to workers.
+ */
+void
+SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc);
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such as into
+ * clause (to build separate dest receiver), object id (to open the created
+ * table) to each workers. Since this information is available in the CTAS
+ * dest receiver, store a reference to it in the Gather state so that it
+ * will be used in ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..9a412c3e6b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the parallel
+ * insert state. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..270f8c13d7 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr)
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclausestr, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..96d745229d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,71 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target table. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert entire
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +224,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +233,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +252,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +273,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +294,16 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 && !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab74c69b9f 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,7 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertionAllowedInCTAS(IntoClause *into);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..a5b281f783 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v17-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v17-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 9ce3e49f0f11d7144566704f5153d1e579a3814c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 15:06:01 +0530
Subject: [PATCH v17 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 61 +++++++++++++++++++++------
src/backend/commands/explain.c | 16 +++++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 +++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 169 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 2beccf60df..5046e02aac 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -316,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate the planner so that it can ignore the
+ * parallel tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ else
+ query->CTASParallelInsInfo = CTAS_PARALLEL_INS_UNDEF;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -344,7 +353,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* executor to decide whether to allow parallel inserts or not.
*/
if (IsParallelInsertionAllowedInCTAS(into))
- SetCTASParallelInsertState(queryDesc);
+ SetCTASParallelInsertState(queryDesc, &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -690,7 +699,7 @@ IsParallelInsertionAllowedInCTAS(IntoClause *into)
* information will be sent to workers.
*/
void
-SetCTASParallelInsertState(QueryDesc *queryDesc)
+SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
{
GatherState *gstate;
DestReceiver *dest;
@@ -701,21 +710,45 @@ SetCTASParallelInsertState(QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are not possible if the upper node is not Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- ((DR_intorel *) dest)->is_parallel = true;
-
/*
- * For parallelizing inserts in CTAS we must send information such as into
- * clause (to build separate dest receiver), object id (to open the created
- * table) to each workers. Since this information is available in the CTAS
- * dest receiver, store a reference to it in the Gather state so that it
- * will be used in ExecInitParallelPlan to pick the required information.
+ * If the upper Gather node has some projections to perform, then we can
+ * not allow parallel insertions. But before returning, ensure that we have
+ * not done wrong parallel tuple cost enforcement in the planner.
+ *
+ * The main reason for this assertion is to check if we enforced planner to
+ * ignore the parallel tuple cost (with the intention of choosing parallel
+ * inserts) due to which the parallel plan may have been chosen, but we do
+ * not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's catch that here.
*/
- gstate->dest = dest;
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such as
+ * into clause (to build separate dest receiver), object id (to open
+ * the created table) to each workers. Since this information is
+ * available in the CTAS dest receiver, store a reference to it in the
+ * Gather state so that it will be used in ExecInitParallelPlan to pick
+ * the required information.
+ */
+ gstate->dest = dest;
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9a412c3e6b..50a1fc2f36 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -387,6 +387,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate the planner so that it can ignore the
+ * parallel tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ else
+ query->CTASParallelInsInfo = CTAS_PARALLEL_INS_UNDEF;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -402,7 +411,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +506,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +572,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* decide whether to allow parallel inserts or not.
*/
if (IsParallelInsertionAllowedInCTAS(into))
- SetCTASParallelInsertState(queryDesc);
+ SetCTASParallelInsertState(queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..e706219102 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..20a18628a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ab74c69b9f..140226ad5a 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -54,5 +72,6 @@ extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern bool IsParallelInsertionAllowedInCTAS(IntoClause *into);
-extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v17-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v17-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From 61641393e4c1d90a47f2a070d6e9e020e6f014e4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 16:49:43 +0530
Subject: [PATCH v17 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 559 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 211 +++++++
3 files changed, 796 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..38a18c5a9b 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,563 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..40aadafc2a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,215 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/x-patch; name=v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 4f71a8b1e0cf50d488af8c925151aed335bb2e8c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 17:00:38 +0530
Subject: [PATCH v17 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 118 ++-
src/backend/optimizer/path/allpaths.c | 38 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1076 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5046e02aac..148e7c7ea2 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -690,35 +690,99 @@ IsParallelInsertionAllowedInCTAS(IntoClause *into)
return true;
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care about Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ }
+
+ return parallel;
+}
+
/*
* SetCTASParallelInsertState --- set the parallel insert state
*
- * See if the upper node is Gather and it doesn't have any projections, then
- * set the parallel insert state such as a flag in the dest receiver and also
- * store the dest receiver reference in the Gather node so that the required
- * information will be sent to workers.
+ * See if the upper node is either Gather or Append but it does have Gather
+ * nodes under it, then set the parallel insert state in respective Gather
+ * nodes if they do not have any projections. The parallel insert state
+ * includes a flag in the dest receiver and also a dest receiver reference in
+ * the Gather node so that the required information will be sent to workers.
*/
void
SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc);
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
-
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
- * Parallel insertions are not possible if the upper node is not Gather.
- */
- if (!IsA(gstate, GatherState))
- return;
-
- /*
- * If the upper Gather node has some projections to perform, then we can
- * not allow parallel insertions. But before returning, ensure that we have
- * not done wrong parallel tuple cost enforcement in the planner.
+ * Ensure that we have not done wrong parallel tuple cost enforcement in
+ * the planner.
*
* The main reason for this assertion is to check if we enforced planner to
* ignore the parallel tuple cost (with the intention of choosing parallel
@@ -730,25 +794,9 @@ SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
* case it occurs, that means the planner may have chosen this parallel
* plan because of our wrong enforcement. So let's catch that here.
*/
- if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ if (!allow && tuple_cost_flags && gather_exists)
Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
- if (!gstate->ps.ps_ProjInfo)
- {
- /* Okay to parallelize inserts, so mark it. */
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts in CTAS we must send information such as
- * into clause (to build separate dest receiver), object id (to open
- * the created table) to each workers. Since this information is
- * available in the CTAS dest receiver, store a reference to it in the
- * Gather state so that it will be used in ExecInitParallelPlan to pick
- * the required information.
- */
- gstate->dest = dest;
- }
-
if (tuple_cost_flags)
*tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 627d08b78a..b0835c32bd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,11 +1104,48 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 20a18628a7..5e607f598a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 140226ad5a..bf52b05ea0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Turn on this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 38a18c5a9b..356a2d0002 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -631,6 +631,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 40aadafc2a..32e6ad8636 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -246,6 +246,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Mon, Dec 28, 2020 at 1:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:
+ if (ignore && + (root->parse->CTASParallelInsInfo & + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above if since when ignore_parallel_tuple_cost returns true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.
Sometimes, we may set the flag CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
before generate_useful_gather_paths, but the
generate_useful_gather_paths can return without reaching cost_gather
where we reset. The generate_useful_gather_paths can return without
reaching cost_gather, in following case
if (rel->partial_pathlist == NIL)
return;
So, for such cases, I'm resetting it here.
+ * In this function we only care Append and Gather nodes.
'care' -> 'care about'
Done.
+ for (int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, + aps->appendplans[i], + gather_exists);It seems the loop termination condition can include parallel since we can come out of the loop once parallel is true.
No, we can not come out of the for loop if parallel is true, because
our intention there is to look for all the child/sub plans under
Append, and push the inserts to the Gather nodes wherever possible.
+ if (!allow && tuple_cost_flags && gather_exists)
As the above code shows, gather_exists is only checked when allow is false.
Yes, if at least one gather node exists under the Append for which the
planner would have ignored the tuple cost, and now if we don't allow
parallel inserts, we should assert that the parallelism is not picked
because of wrong parallel tuple cost enforcement.
+ * We set the flag for two cases when there is no parent path will + * be created(such as : limit,sort,distinct...):Please correct the grammar : there are two verbs following 'when'
Done.
For set_append_rel_size:
+ { + root->parse->CTASParallelInsInfo |= + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND; + } + } + + if (root->parse->CTASParallelInsInfo & + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) + { + root->parse->CTASParallelInsInfo &= + ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;In the if block for childrel->rtekind == RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it cleared immediately after ?
Thanks for pointing that out. It's a miss, intention is to reset it
after set_rel_size(). Corrected in the v17 patch.
+ /* Set to this in case tuple cost needs to be ignored for Append cases. */ + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn on' or similar term in the comment. Because 'set to' normally means assignment.
Done.
All the above comments are addressed in the v17 patch set posted
upthread. Please have a look.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 28, 2020 at 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
Test comments are detailed in a few cases and in few others it is not detailed for similar kinds of parallelism selected tests. I felt we could make the test comments consistent across the file.
Modified the test case description in the v17 patch set posted
upthread. Please have a look.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
+ * Push the dest receiver to Gather node when it is either at the top of
the
+ * plan or under top Append node unless it does not have any projections
to do.
I think the 'unless' should be 'if'. As can be seen from the body of the
method:
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ parallel = true;
Cheers
On Mon, Dec 28, 2020 at 4:12 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:Thanks for working on this, I will have a look at the updated patches
soon.
Attaching v17 patch set after addressing comments raised in other
threads. Please consider this patch set for further review.With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
+ * Push the dest receiver to Gather node when it is either at the top of the + * plan or under top Append node unless it does not have any projections to do.I think the 'unless' should be 'if'. As can be seen from the body of the method:
+ if (!ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + + parallel = true;
Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.
Please consider v18 patch set for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v18-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v18-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From d36c81fb6e13a8ec3c36fd6cd3bb096d783715e7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 14:30:45 +0530
Subject: [PATCH v18 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 --
src/backend/access/transam/xact.c | 28 ++++-
src/backend/commands/createas.c | 137 ++++++++++++++++++++++---
src/backend/commands/explain.c | 31 ++++++
src/backend/executor/execParallel.c | 70 ++++++++++++-
src/backend/executor/nodeGather.c | 111 ++++++++++++++++++--
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 26 +++++
src/include/executor/execParallel.h | 6 +-
src/include/nodes/execnodes.h | 3 +
11 files changed, 384 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..86347ba273 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..4e5366fe78 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..2beccf60df 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -51,18 +51,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -350,6 +338,14 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by the
+ * executor to decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ SetCTASParallelInsertState(queryDesc);
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -418,6 +414,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -438,6 +437,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -540,6 +568,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
@@ -606,3 +655,67 @@ intorel_destroy(DestReceiver *self)
{
pfree(self);
}
+
+/*
+ * IsParallelInsertionAllowedInCTAS --- check if parallel insertion is allowed
+ *
+ * Do not allow parallel inserts if the table is temporary. As the temporary
+ * tables are backend local, workers can not know about them. Currently, CTAS
+ * supports creation of normal(logged), temporary and unlogged tables. It does
+ * not support foreign or partition table creation. Hence the check for
+ * temporary table is enough here.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+bool
+IsParallelInsertionAllowedInCTAS(IntoClause *into)
+{
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!IS_CTAS(into))
+ return false;
+
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+}
+
+/*
+ * SetCTASParallelInsertState --- set the parallel insert state
+ *
+ * See if the upper node is Gather and it doesn't have any projections, then
+ * set the parallel insert state such as a flag in the dest receiver and also
+ * store the dest receiver reference in the Gather node so that the required
+ * information will be sent to workers.
+ */
+void
+SetCTASParallelInsertState(QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc);
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such as into
+ * clause (to build separate dest receiver), object id (to open the created
+ * table) to each workers. Since this information is available in the CTAS
+ * dest receiver, store a reference to it in the Gather state so that it
+ * will be used in ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..9a412c3e6b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -556,6 +556,14 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the parallel
+ * insert state. We need plan state to be initialized by the executor to
+ * decide whether to allow parallel inserts or not.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ SetCTASParallelInsertState(queryDesc);
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1775,6 +1783,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index befde52691..270f8c13d7 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,9 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ Oid objectid; /* workers to open relation/table. */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -578,7 +583,8 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, IntoClause *intoclause,
+ Oid objectid)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -600,6 +606,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
Size dsa_minsize = dsa_minimum_size();
char *query_string;
int query_len;
+ char *intoclausestr = NULL;
+ int intoclause_len = 0;
/*
* Force any initplan outputs that we're going to pass to workers to be
@@ -712,6 +720,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +746,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+ pg_atomic_init_u64(&fpes->processed, 0);
+ pei->processed = &fpes->processed;
+
+ if (intoclausestr)
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +783,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (intoclausestr)
+ {
+ char *intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclausestr, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1387,12 +1422,30 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
void *area_space;
dsa_area *area;
ParallelWorkerContext pwcxt;
+ char *intoclausestr = NULL;
+ IntoClause *intoclause = NULL;
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ intoclausestr = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+ if (intoclausestr)
+ {
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+ else
+ {
+ /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
+ receiver = ExecParallelGetReceiver(seg, toc);
+ }
+
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1524,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (intoclausestr)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a01b46af14..96d745229d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -32,6 +32,7 @@
#include "access/relscan.h"
#include "access/xact.h"
+#include "commands/createas.h"
#include "executor/execdebug.h"
#include "executor/execParallel.h"
#include "executor/nodeGather.h"
@@ -48,6 +49,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsertInCTAS(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +133,71 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsertInCTAS(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for Create Table AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsertInCTAS(GatherState *node)
+{
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target table. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert entire
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +224,7 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ bool isctas = IS_PARALLEL_CTAS_DEST(node->dest);
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +233,18 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ IntoClause *intoclause = NULL;
+ Oid objectid = InvalidOid;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +252,10 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ /* CTAS info */
+ intoclause,
+ objectid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +273,16 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ if (!isctas)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -208,9 +294,16 @@ ExecGather(PlanState *pstate)
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
+ node->need_to_scan_locally = (node->nreaders == 0 && !isctas)
|| (!gather->single_copy && parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for CTAS. */
+ if (isctas)
+ {
+ ExecParallelInsertInCTAS(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 47129344f3..ee45272c17 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ NULL,
+ InvalidOid);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..5beae6c617 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 7629230254..ab74c69b9f 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,35 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
+#include "executor/execdesc.h"
#include "nodes/params.h"
+#include "nodes/plannodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
+
+#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
+#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
+ IS_CTAS(((DR_intorel *) dest)->into) && \
+ ((DR_intorel *) dest)->is_parallel)
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
@@ -29,4 +52,7 @@ extern int GetIntoRelEFlags(IntoClause *intoClause);
extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
+extern bool IsParallelInsertionAllowedInCTAS(IntoClause *into);
+
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
#endif /* CREATEAS_H */
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5a39a5b29c..9f959f741b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -35,11 +35,15 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ IntoClause *intoclause,
+ Oid objectid);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..a5b281f783 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v18-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v18-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 9ce3e49f0f11d7144566704f5153d1e579a3814c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 15:06:01 +0530
Subject: [PATCH v18 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 61 +++++++++++++++++++++------
src/backend/commands/explain.c | 16 +++++--
src/backend/commands/prepare.c | 3 +-
src/backend/optimizer/path/costsize.c | 22 +++++++++-
src/backend/optimizer/plan/planner.c | 53 +++++++++++++++++++++++
src/include/commands/createas.h | 21 ++++++++-
src/include/commands/explain.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/optimizer/planner.h | 10 +++++
9 files changed, 169 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 2beccf60df..5046e02aac 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -316,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate the planner so that it can ignore the
+ * parallel tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ else
+ query->CTASParallelInsInfo = CTAS_PARALLEL_INS_UNDEF;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -344,7 +353,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* executor to decide whether to allow parallel inserts or not.
*/
if (IsParallelInsertionAllowedInCTAS(into))
- SetCTASParallelInsertState(queryDesc);
+ SetCTASParallelInsertState(queryDesc, &query->CTASParallelInsInfo);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -690,7 +699,7 @@ IsParallelInsertionAllowedInCTAS(IntoClause *into)
* information will be sent to workers.
*/
void
-SetCTASParallelInsertState(QueryDesc *queryDesc)
+SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
{
GatherState *gstate;
DestReceiver *dest;
@@ -701,21 +710,45 @@ SetCTASParallelInsertState(QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are not possible if the upper node is not Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- ((DR_intorel *) dest)->is_parallel = true;
-
/*
- * For parallelizing inserts in CTAS we must send information such as into
- * clause (to build separate dest receiver), object id (to open the created
- * table) to each workers. Since this information is available in the CTAS
- * dest receiver, store a reference to it in the Gather state so that it
- * will be used in ExecInitParallelPlan to pick the required information.
+ * If the upper Gather node has some projections to perform, then we can
+ * not allow parallel insertions. But before returning, ensure that we have
+ * not done wrong parallel tuple cost enforcement in the planner.
+ *
+ * The main reason for this assertion is to check if we enforced planner to
+ * ignore the parallel tuple cost (with the intention of choosing parallel
+ * inserts) due to which the parallel plan may have been chosen, but we do
+ * not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's catch that here.
*/
- gstate->dest = dest;
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such as
+ * into clause (to build separate dest receiver), object id (to open
+ * the created table) to each workers. Since this information is
+ * available in the CTAS dest receiver, store a reference to it in the
+ * Gather state so that it will be used in ExecInitParallelPlan to pick
+ * the required information.
+ */
+ gstate->dest = dest;
+ }
+
+ if (tuple_cost_flags)
+ *tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9a412c3e6b..50a1fc2f36 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -387,6 +387,15 @@ ExplainOneQuery(Query *query, int cursorOptions,
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate the planner so that it can ignore the
+ * parallel tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowedInCTAS(into))
+ query->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ else
+ query->CTASParallelInsInfo = CTAS_PARALLEL_INS_UNDEF;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -402,7 +411,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->CTASParallelInsInfo);
}
}
@@ -496,7 +506,7 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage, uint8 *ctas_tuple_cost_flags)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -562,7 +572,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* decide whether to allow parallel inserts or not.
*/
if (IsParallelInsertionAllowedInCTAS(into))
- SetCTASParallelInsertState(queryDesc);
+ SetCTASParallelInsertState(queryDesc, ctas_tuple_cost_flags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 89087a7be3..07166479e7 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 22d6935824..e706219102 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -378,6 +379,7 @@ cost_gather(GatherPath *path, PlannerInfo *root,
{
Cost startup_cost = 0;
Cost run_cost = 0;
+ bool ignore_tuple_cost = false;
/* Mark the path with the correct row estimate */
if (rows)
@@ -393,7 +395,25 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the CTAS.
+ */
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1a94b58f8b..20a18628a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "commands/createas.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,37 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for CTAS and we are
+ * generating an upper level Gather path.
+*/
+static bool
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return false;
+
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,8 +7589,29 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
+ }
+
/*
* Reassess which paths are the cheapest, now that we've potentially added
* new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ab74c69b9f..140226ad5a 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -39,6 +39,24 @@ typedef struct
Oid object_id;
} DR_intorel;
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
+
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
#define IS_PARALLEL_CTAS_DEST(dest) (dest && dest->mydest == DestIntoRel && \
IS_CTAS(((DR_intorel *) dest)->into) && \
@@ -54,5 +72,6 @@ extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause);
extern bool IsParallelInsertionAllowedInCTAS(IntoClause *into);
-extern void SetCTASParallelInsertState(QueryDesc *queryDesc);
+extern void SetCTASParallelInsertState(QueryDesc *queryDesc,
+ uint8 *tuple_cost_flags);
#endif /* CREATEAS_H */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ba661d32a6..1a1806dbf1 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *ctas_tuple_cost_flags);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 48a79a7657..81b148c383 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,7 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ uint8 CTASParallelInsInfo; /* parallel insert in CTAS info */
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index beb7dbbcbe..74b2563828 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v18-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v18-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From 61641393e4c1d90a47f2a070d6e9e020e6f014e4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 16:49:43 +0530
Subject: [PATCH v18 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 559 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 211 +++++++
3 files changed, 796 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..38a18c5a9b 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,563 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..40aadafc2a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,215 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v18-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v18-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 4f71a8b1e0cf50d488af8c925151aed335bb2e8c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 17:00:38 +0530
Subject: [PATCH v18 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/commands/createas.c | 118 ++-
src/backend/optimizer/path/allpaths.c | 38 +
src/backend/optimizer/plan/planner.c | 10 +-
src/include/commands/createas.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1076 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5046e02aac..148e7c7ea2 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -690,35 +690,99 @@ IsParallelInsertionAllowedInCTAS(IntoClause *into)
return true;
}
+/*
+ * PushDownCTASParallelInsertState --- push the dest receiver down to the
+ * Gather nodes.
+ *
+ * In this function we only care about Append and Gather nodes.
+ *
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In any case, this function returns true if at least one Gather node can allow
+ * parallel insertions by the workers. Otherwise returns false.
+ */
+static bool
+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps,
+ bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest,
+ aps->appendplans[i],
+ gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ }
+
+ return parallel;
+}
+
/*
* SetCTASParallelInsertState --- set the parallel insert state
*
- * See if the upper node is Gather and it doesn't have any projections, then
- * set the parallel insert state such as a flag in the dest receiver and also
- * store the dest receiver reference in the Gather node so that the required
- * information will be sent to workers.
+ * See if the upper node is either Gather or Append but it does have Gather
+ * nodes under it, then set the parallel insert state in respective Gather
+ * nodes if they do not have any projections. The parallel insert state
+ * includes a flag in the dest receiver and also a dest receiver reference in
+ * the Gather node so that the required information will be sent to workers.
*/
void
SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc);
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
-
+ allow = PushDownCTASParallelInsertState(queryDesc->dest,
+ queryDesc->planstate,
+ &gather_exists);
/*
- * Parallel insertions are not possible if the upper node is not Gather.
- */
- if (!IsA(gstate, GatherState))
- return;
-
- /*
- * If the upper Gather node has some projections to perform, then we can
- * not allow parallel insertions. But before returning, ensure that we have
- * not done wrong parallel tuple cost enforcement in the planner.
+ * Ensure that we have not done wrong parallel tuple cost enforcement in
+ * the planner.
*
* The main reason for this assertion is to check if we enforced planner to
* ignore the parallel tuple cost (with the intention of choosing parallel
@@ -730,25 +794,9 @@ SetCTASParallelInsertState(QueryDesc *queryDesc, uint8 *tuple_cost_flags)
* case it occurs, that means the planner may have chosen this parallel
* plan because of our wrong enforcement. So let's catch that here.
*/
- if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ if (!allow && tuple_cost_flags && gather_exists)
Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
- if (!gstate->ps.ps_ProjInfo)
- {
- /* Okay to parallelize inserts, so mark it. */
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts in CTAS we must send information such as
- * into clause (to build separate dest receiver), object id (to open
- * the created table) to each workers. Since this information is
- * available in the CTAS dest receiver, store a reference to it in the
- * Gather state so that it will be used in ExecInitParallelPlan to pick
- * the required information.
- */
- gstate->dest = dest;
- }
-
if (tuple_cost_flags)
*tuple_cost_flags = CTAS_PARALLEL_INS_UNDEF;
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 627d08b78a..b0835c32bd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "commands/createas.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,11 +1104,48 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_SELECT &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->CTASParallelInsInfo |=
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
set_rel_size(root, childrel, childRTindex, childRTE);
+ if (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+ }
+
/*
* It is possible that constraint exclusion detected a contradiction
* within a child subquery, even though we didn't prove one above. If
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 20a18628a7..5e607f598a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7350,8 +7350,14 @@ can_partial_agg(PlannerInfo *root)
static bool
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND))
+ {
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_SELECT;
+ }
+
+ if (root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 140226ad5a..bf52b05ea0 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -54,7 +54,9 @@ typedef enum CTASParallelInsertOpt
*/
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
/* Turn on this after the cost is ignored. */
- CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
} CTASParallelInsertOpt;
#define IS_CTAS(intoclause) (intoclause && IsA(intoclause, IntoClause))
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 38a18c5a9b..356a2d0002 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -631,6 +631,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 40aadafc2a..32e6ad8636 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -246,6 +246,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Mon, Dec 28, 2020 at 10:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.Thanks a lot.
1.
Why is this temporary hack? and what is the plan for removing this hack?The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.2. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{ + if (!IS_CTAS(into)) + return;When will this hit? The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return, can you add the comment that when do you
expect this case?Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
ChooseParallelInsertsInCTAS()Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)Ah, missed that. Modified now.
3. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */Push down to the Gather nodes? I think the right statement will be
push down below the Gather node.Modified.
4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
DR_intorel *myState = (DR_intorel *) self;-- Comment ->in parallel worker we don't need to crease dest recv blah blah + if (myState->is_parallel_worker) { --parallel worker handling-- return; }--non-parallel worker code stay right there, instead of moving to else
Done.
5. +/* + * ChooseParallelInsertsInCTAS --- determine whether or not parallel + * insertion is possible, if yes set the parallel insert state i.e. push down + * the dest receiver to the Gather nodes. + */ +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) +{From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not. I think name/comment
should be better for thisYeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.6. /* + * For parallelizing inserts in CTAS i.e. making each parallel worker + * insert the tuples, we must send information such as into clause (for + * each worker to build separate dest receiver), object id (for each + * worker to open the created table).Comment is saying we need to pass object id but the code under this
comment is not doing so.Improved the comment.
7. + /* + * Since there are no rows that are transferred from workers to Gather + * node, so we set it to 0 to be visible in estimated row count of + * explain plans. + */ + queryDesc->planstate->plan->plan_rows = 0;This seems a bit hackies Why it is done after the planning, I mean
plan must know that it is returning a 0 rows?This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
if (es->costs)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f width=%d)",
plan->startup_cost, plan->total_cost,
plan->plan_rows, plan->plan_width);Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?I removed it in v6 patch set.
8. + char *intoclause_space = shm_toc_allocate(pcxt->toc, + intoclause_len); + memcpy(intoclause_space, intoclausestr, intoclause_len); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);One blank line between variable declaration and next code segment,
take care at other places as well.Done.
I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.[1] - /messages/by-id/CAA4eK1JqwXGYoGa1+3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA@mail.gmail.com
Thanks for working on this, I will have a look at the updated patches soon.
I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.
+ /* If parallel inserts are to be allowed, set a few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
Is there any performance data for this or just theoretical analysis?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.
Thanks.
+ /* If parallel inserts are to be allowed, set a few extra information. */ + if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;Is there any performance data for this or just theoretical analysis?
I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.Thanks.
+ /* If parallel inserts are to be allowed, set a few extra
information. */
+ if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuplesfor
+ * parallel mode, while extending the relations, workers
instead of
+ * blocking on a page while another worker is inserting, can
check the
+ * FSM for another page that can accommodate the tuples. This
results
+ * in major benefit for parallel inserts. + */ + myState->ti_options = 0;Is there any performance data for this or just theoretical analysis?
I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.
Yeah that’s fine
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.Thanks.
+ /* If parallel inserts are to be allowed, set a few extra information. */ + if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;Is there any performance data for this or just theoretical analysis?
I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.Yeah that’s fine
Some comments in 0002
1.
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;
I don't like the naming of these flags. Especially no need to define
CTAS_PARALLEL_INS_UNDEF, we can directl use 0
for that purpose instead of giving some weird name. So I suggest
first, just get rid of CTAS_PARALLEL_INS_UNDEF.
2.
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);
+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }
I think th way we are using these cost ignoring flag, doesn't look clean.
I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
CTAS and then ignore_parallel_tuple_cost will
set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
condition which is fine. Now, internally cost
gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
it outside. Why do we need to remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?
3.
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
Instead of adding Assert inside an IF statement, you can convert whole
statement as an assert. Lets not add unnecessary
if in the release mode.
4.
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;
Changes this to (if, else) as shown below, because if it goes to the
IF part then ignore_tuple_cost will always be true
so no need to have an extra if check.
if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
(root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
{
ignore_tuple_cost = true;
root->parse->CTASParallelInsInfo &=
~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
}
else
run_cost += parallel_tuple_cost * path->path.rows;
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.Thanks.
+ /* If parallel inserts are to be allowed, set a few extra information. */ + if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;Is there any performance data for this or just theoretical analysis?
I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.
When you run the performance tests, you can try to capture and publish
relation size & the number of pages that are getting created for base
table and the CTAS table, you can use something like SELECT relpages
FROM pg_class WHERE relname = 'tablename & SELECT
pg_total_relation_size('tablename'). Just to make sure that there is
no significant difference between the base table and CTAS table.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 9:25 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
+ * Push the dest receiver to Gather node when it is either at the top of the + * plan or under top Append node unless it does not have any projections to do.I think the 'unless' should be 'if'. As can be seen from the body of the method:
+ if (!ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + + parallel = true;Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.Please consider v18 patch set for further review.
Few comments:
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a
parallel worker")));
Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.
+ Oid objectid; /* workers to
open relation/table. */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
We can just mention relation instead of relation/table.
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Thanks for the comments.
How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?
+typedef enum ParallelInsertTupleCostOpt
+{
+ PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PINS_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PINS_TUP_COST_IGNORED = 1 << 2
My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.
For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
PINS_UNDEF = 0,
PINS_CREATE_TABLE_AS,
PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;
typedef struct ParallelInsertCTASInfo
{
IntoClause *intoclause;
Oid objectid;
} ParallelInsertCTASInfo;
typedef struct ParallelInsertRMVInfo
{
Oid objectid;
} ParallelInsertRMVInfo;
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, ParallelInsertKind pinskind,
+ void *pinsinfo)
Change ExecParallelInsertInCTAS to
+static void
+ExecParallelInsert(GatherState *node)
+{
Change SetCTASParallelInsertState to
+void
+SetParallelInsertState(QueryDesc *queryDesc)
Change IsParallelInsertionAllowedInCTAS to
+bool
+IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
+{
Thoughts?
If okay, I can work on these points and add a new patch into the patch
set that will have changes for parallel inserts in REFRESH
MATERIALIZED VIEW.
On Wed, Dec 30, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Some comments in 0002
1. +/* + * Information sent to the planner from CTAS to account for the cost + * calculations in cost_gather. We need to do this because, no tuples will be + * received by the Gather node if the workers insert the tuples in parallel. + */ +typedef enum CTASParallelInsertOpt +{ + CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */ + CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */ + /* + * Turn on this while planning for upper Gather path to ignore parallel + * tuple cost in cost_gather. + */ + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1, + /* Turn on this after the cost is ignored. */ + CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2 +} CTASParallelInsertOpt;I don't like the naming of these flags. Especially no need to define
CTAS_PARALLEL_INS_UNDEF, we can directl use 0
for that purpose instead of giving some weird name. So I suggest
first, just get rid of CTAS_PARALLEL_INS_UNDEF.
+1. I will change it in the next version of the patch.
2. + /* + * Turn on a flag to ignore parallel tuple cost by the Gather path in + * cost_gather if the SELECT is for CTAS and we are generating an upper + * level Gather path. + */ + bool ignore = ignore_parallel_tuple_cost(root); + generate_useful_gather_paths(root, rel, false);+ /* + * Reset the ignore flag, in case we turned it on but + * generate_useful_gather_paths returned without reaching cost_gather. + * If we reached cost_gather, we would have been reset it there. + */ + if (ignore && (root->parse->CTASParallelInsInfo & + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN)) + { + root->parse->CTASParallelInsInfo &= + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN; + }I think th way we are using these cost ignoring flag, doesn't look clean.
I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
CTAS and then ignore_parallel_tuple_cost will
set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
condition which is fine. Now, internally cost
gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
it outside. Why do we need to remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?
Yes we don't need to remove the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
flag. I will change it in the next version.
3. + if (tuple_cost_flags && gstate->ps.ps_ProjInfo) + Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));Instead of adding Assert inside an IF statement, you can convert whole
statement as an assert. Lets not add unnecessary
if in the release mode.
+1. I will change it in the version.
4. + if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) && + (root->parse->CTASParallelInsInfo & + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN)) + { + ignore_tuple_cost = true; + root->parse->CTASParallelInsInfo &= + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN; + root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED; + } + + if (!ignore_tuple_cost) + run_cost += parallel_tuple_cost * path->path.rows;Changes this to (if, else) as shown below, because if it goes to the
IF part then ignore_tuple_cost will always be true
so no need to have an extra if check.if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
(root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
{
ignore_tuple_cost = true;
root->parse->CTASParallelInsInfo &=
~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
}
else
run_cost += parallel_tuple_cost * path->path.rows;
+1. I will change it in the next version.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 7:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks for the comments.
How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?+typedef enum ParallelInsertTupleCostOpt +{ + PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */ + /* + * Turn on this while planning for upper Gather path to ignore parallel + * tuple cost in cost_gather. + */ + PINS_CAN_IGN_TUP_COST = 1 << 1, + /* Turn on this after the cost is ignored. */ + PINS_TUP_COST_IGNORED = 1 << 2My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.
I think instead of PINS_* we can name PARALLEL_INSERT_* other than
that I am fine with the name.
For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
PINS_UNDEF = 0,
PINS_CREATE_TABLE_AS,
PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;typedef struct ParallelInsertCTASInfo
{
IntoClause *intoclause;
Oid objectid;
} ParallelInsertCTASInfo;typedef struct ParallelInsertRMVInfo
{
Oid objectid;
} ParallelInsertRMVInfo;ExecInitParallelPlan(PlanState *planstate, EState *estate, Bitmapset *sendParams, int nworkers, - int64 tuples_needed) + int64 tuples_needed, ParallelInsertKind pinskind, + void *pinsinfo)Change ExecParallelInsertInCTAS to
+static void +ExecParallelInsert(GatherState *node) +{Change SetCTASParallelInsertState to +void +SetParallelInsertState(QueryDesc *queryDesc)Change IsParallelInsertionAllowedInCTAS to
+bool +IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into) +{Thoughts?
I haven’t thought about these structures yet but yeah making them
generic will be good.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 5:26 PM vignesh C <vignesh21@gmail.com> wrote:
On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have completed reviewing 0001, I don't have more comments, just one
question. Soon I will review the remaining patches.Thanks.
+ /* If parallel inserts are to be allowed, set a few extra information. */ + if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;Is there any performance data for this or just theoretical analysis?
I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.When you run the performance tests, you can try to capture and publish
relation size & the number of pages that are getting created for base
table and the CTAS table, you can use something like SELECT relpages
FROM pg_class WHERE relname = 'tablename & SELECT
pg_total_relation_size('tablename'). Just to make sure that there is
no significant difference between the base table and CTAS table.
I can do that, I'm sure the number of pages will be equal or little
more, since I observed this for parallel copy.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
Few comments:
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a
parallel worker")));Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.
Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.
Thoughts?
+ Oid objectid; /* workers to open relation/table. */ + /* Number of tuples inserted by all the workers. */ + pg_atomic_uint64 processed;We can just mention relation instead of relation/table.
I will modify it in the next patch set.
+select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); + explain_pictas +---------------------------------------------------------- + Gather (actual rows=N loops=N) + Workers Planned: 4 + Workers Launched: N + -> Create parallel_write + -> Parallel Seq Scan on tenk1 (actual rows=N loops=N) +(5 rows) + +select count(*) from parallel_write;Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;
This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 30-12-2020 04:55, Bharath Rupireddy wrote:
On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
+ * Push the dest receiver to Gather node when it is either at the top of the + * plan or under top Append node unless it does not have any projections to do.I think the 'unless' should be 'if'. As can be seen from the body of the method:
+ if (!ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + + parallel = true;Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.Please consider v18 patch set for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Sorry it took so long to get back to reviewing this.
wrt v18-0001....patch:
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g.
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
createas.c.
I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
self->pub.rStartup to intorel_worker_startup.
+ volatile pg_atomic_uint64 *processed;
why is it volatile?
+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
Given that you extract them each once and then pass them directly into
the parallel-worker, can't you instead pass in the destreceiver and
leave that logic to ExecInitParallelPlan?
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is always done?
+ * In case if no workers were launched, allow the leader to insert entire
+ * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?
================
wrt v18-0002....patch:
It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?
================
wrt v18-0003....patch:
not sure if it is needed, but i was wondering if we would want more
tests with multiple gather nodes existing? caused e.g. by using CTE's,
valid subquery's (like the one test you have, but without the group
by/having)?
Kind regards,
Luc
Hi
================
wrt v18-0002....patch:It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?
IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.
Best regards,
houzj
On 04-01-2021 12:16, Hou, Zhijie wrote:
Hi
================
wrt v18-0002....patch:It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.Best regards,
houzj
Hi,
I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.
Kind regards,
Luc
On Thu, Dec 31, 2020 at 10:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?+typedef enum ParallelInsertTupleCostOpt +{ + PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */ + /* + * Turn on this while planning for upper Gather path to ignore parallel + * tuple cost in cost_gather. + */ + PINS_CAN_IGN_TUP_COST = 1 << 1, + /* Turn on this after the cost is ignored. */ + PINS_TUP_COST_IGNORED = 1 << 2My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.I think instead of PINS_* we can name PARALLEL_INSERT_* other than
that I am fine with the name.
Done.
For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
PINS_UNDEF = 0,
PINS_CREATE_TABLE_AS,
PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;typedef struct ParallelInsertCTASInfo
{
IntoClause *intoclause;
Oid objectid;
} ParallelInsertCTASInfo;typedef struct ParallelInsertRMVInfo
{
Oid objectid;
} ParallelInsertRMVInfo;ExecInitParallelPlan(PlanState *planstate, EState *estate, Bitmapset *sendParams, int nworkers, - int64 tuples_needed) + int64 tuples_needed, ParallelInsertKind pinskind, + void *pinsinfo)Change ExecParallelInsertInCTAS to
+static void +ExecParallelInsert(GatherState *node) +{Change SetCTASParallelInsertState to +void +SetParallelInsertState(QueryDesc *queryDesc)Change IsParallelInsertionAllowedInCTAS to
+bool +IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into) +{Thoughts?
I haven’t thought about these structures yet but yeah making them
generic will be good.
Attaching v19 patch set. It has following changes: 1) generic code
which can easily be extended to parallel inserts in Refresh
Materialized View, parallelizing Copy To command 2) addressing the
review comments received so far.
Once these patches are reviewed and get to the commit stage, I can
post a separate patch (probably in a separate thread) for parallel
inserts in Refresh Materialized View based on this patch set.
Please review the v19 patch set further.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v19-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v19-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From db57d18f21296feba3f284773bd6b4d0de62c0eb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:09:35 +0530
Subject: [PATCH v19 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 28 ++-
src/backend/commands/createas.c | 84 ++++++-
src/backend/commands/explain.c | 47 ++++
src/backend/executor/execParallel.c | 322 ++++++++++++++++++++++++-
src/backend/executor/nodeGather.c | 130 +++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 16 ++
src/include/executor/execParallel.h | 42 +++-
src/include/nodes/execnodes.h | 3 +
11 files changed, 640 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..3741d824bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..750d15a572 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..a8050a2767 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
#include "commands/prepare.h"
#include "commands/tablecmds.h"
#include "commands/view.h"
+#include "executor/execParallel.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -51,18 +52,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -294,6 +283,11 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
}
else
{
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
/*
* Parse analysis was done already, but we still have to run the rule
* rewriter. We do not do AcquireRewriteLocks: we assume the query
@@ -338,6 +332,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -441,6 +448,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -461,6 +471,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -563,6 +602,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..0ae5d8c65f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execParallel.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -572,6 +573,27 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1791,6 +1813,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+ ((DR_intorel *) gstate->dest)->into &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..7ed3e9e3b6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,10 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ ParallelInsertCmdKind ins_cmd_type; /* parallel insertion command type */
+ Oid objectid; /* used by workers to open relation */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -135,10 +141,23 @@ static bool ExecParallelReInitializeDSM(PlanState *planstate,
ParallelContext *pcxt);
static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
SharedExecutorInstrumentation *instrumentation);
-
-/* Helper function that runs in the parallel worker. */
+static void ParallelInsCmdEstimate(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+
+/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static DestReceiver *ExecParallelGetInsReceiver(shm_toc *toc,
+ FixedParallelExecutorState *fpes);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -578,7 +597,9 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -712,6 +733,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for parallel insertions. */
+ if (parallel_ins_info)
+ ParallelInsCmdEstimate(pcxt, parallel_ins_cmd, parallel_ins_info);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +754,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion fixed info into DSA. */
+ SaveParallelInsCmdFixedInfo(pei, fpes, parallel_ins_cmd,
+ parallel_ins_info);
+ }
+ else
+ {
+ pei->processed = NULL;
+ fpes->ins_cmd_type = PARALLEL_INSERT_CMD_UNDEF;
+ fpes->objectid = InvalidOid;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +797,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion info into DSA. */
+ SaveParallelInsCmdInfo(pcxt, parallel_ins_cmd, parallel_ins_info);
+
+ /*
+ * Tuple queues are not required in case of parallel insertions by the
+ * workers, because Gather node will not receive any tuples.
+ */
+ pei->tqueue = NULL;
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1391,8 +1444,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ /* Set up DestReceiver. */
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
+ else
+ receiver = ExecParallelGetReceiver(seg, toc);
+
+ /* Set up SharedExecutorInstrumentation, and QueryDesc. */
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1529,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
@@ -1479,3 +1544,246 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
FreeQueryDesc(queryDesc);
receiver->rDestroy(receiver);
}
+
+/*
+ * Estimate space required for sending parallel insert information to workers
+ * in commands such as CTAS.
+ */
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len = 0;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * Save fixed state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pei && fpes && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ pg_atomic_init_u64(&fpes->processed, 0);
+ fpes->ins_cmd_type = ins_cmd;
+ pei->processed = &fpes->processed;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ fpes->objectid = info->objectid;
+ }
+}
+
+/*
+ * Save variable state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len;
+ char *intoclause_space = NULL;
+
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclause_str, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+}
+
+/*
+ * Create a DestReceiver to write produced tuples to target relation in case of
+ * parallel insertions.
+ */
+static DestReceiver *
+ExecParallelGetInsReceiver(shm_toc *toc, FixedParallelExecutorState *fpes)
+{
+ ParallelInsertCmdKind ins_cmd;
+ DestReceiver *receiver;
+
+ Assert(fpes && toc &&
+ (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ ins_cmd = fpes->ins_cmd_type;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ char *intoclause_str = NULL;
+ IntoClause *intoclause = NULL;
+
+ intoclause_str = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclause_str);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+
+ return receiver;
+}
+
+/*
+ * Given a DestReceiver, return the command type if parallelism is allowed.
+ */
+ParallelInsertCmdKind
+GetParallelInsertCmdType(DestReceiver *dest)
+{
+ if (!dest)
+ return PARALLEL_INSERT_CMD_UNDEF;
+
+ if (dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel)
+ return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
+
+ return PARALLEL_INSERT_CMD_UNDEF;
+}
+
+/*
+ * Given a DestReceiver, allocate and fill parallel insert info structure
+ * corresponding to command type.
+ *
+ * Note that the memory allocated here for the info structure has to be freed
+ * up in caller.
+ */
+void *
+GetParallelInsertCmdInfo(DestReceiver *dest, ParallelInsertCmdKind ins_cmd)
+{
+ void *parallel_ins_info = NULL;
+
+ Assert(dest && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *)
+ palloc0(sizeof(ParallelInsertCTASInfo));
+ ctas_info->intoclause = ((DR_intorel *) dest)->into;
+ ctas_info->objectid = ((DR_intorel *) dest)->object_id;
+ parallel_ins_info = ctas_info;
+ }
+
+ return parallel_ins_info;
+}
+
+/*
+ * Check if parallel insertion is allowed in commands such as CTAS.
+ *
+ * Return true if allowed, otherwise false.
+ */
+bool
+IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
+{
+ Assert(ins_info && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ /*
+ * For CTAS, do not allow parallel inserts if target table is temporary. As
+ * the temporary tables are backend local, workers can not know about them.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+ IntoClause *into = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *) ins_info;
+ into = ctas_info->intoclause;
+
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!(into && IsA(into, IntoClause)))
+ return false;
+
+ /*
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Set the parallel insert state, if the upper node is Gather and it doesn't
+ * have any projections. The parallel insert state includes information such as
+ * a flag in the dest receiver and also a dest receiver reference in the Gather
+ * node so that the required information will be picked and sent to workers.
+ */
+void
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is into
+ * clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver, store
+ * a reference to it in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..1ab3e0f600 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,6 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsert(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +132,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsert(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for commands such as CREATE TABLE AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsert(GatherState *node)
+{
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target relation. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert all
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed +=
+ pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +224,17 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ ParallelInsertCmdKind parallel_ins_cmd;
+ bool perform_parallel_ins = false;
+
+ /*
+ * Get the parallel insert command type from the dest receiver which
+ * would have been set in SetParallelInsertState().
+ */
+ parallel_ins_cmd = GetParallelInsertCmdType(node->dest);
+
+ if (parallel_ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ perform_parallel_ins = true;
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +243,15 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ void *parallel_ins_info = NULL;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in commands such as CTAS.
+ */
+ if (perform_parallel_ins)
+ parallel_ins_info = GetParallelInsertCmdInfo(node->dest,
+ parallel_ins_cmd);
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +259,9 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ parallel_ins_cmd,
+ parallel_ins_info);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +279,22 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ /*
+ * Do not create tuple queue readers for commands with parallel
+ * insertion. Because the gather node will not receive any
+ * tuples, the workers will insert the tuples into the target
+ * relation.
+ */
+ if (!perform_parallel_ins)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -205,12 +303,24 @@ ExecGather(PlanState *pstate)
node->reader = NULL;
}
node->nextreader = 0;
+
+ /* Free up the parallel insert info, if allocated. */
+ if (parallel_ins_info)
+ pfree(parallel_ins_info);
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
- || (!gather->single_copy && parallel_leader_participation);
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !perform_parallel_ins) || (!gather->single_copy &&
+ parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for commands such as CTAS. */
+ if (perform_parallel_ins)
+ {
+ ExecParallelInsert(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..ea72473c8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ 0,
+ NULL);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f49a57b35e..4cd6f972ed 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ad5054d116..74022aab41 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..689f577c08 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -14,6 +14,7 @@
#define EXECPARALLEL_H
#include "access/parallel.h"
+#include "executor/execdesc.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
@@ -35,11 +36,42 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
+
+/*
+ * For each of the command added to ParallelInsertCmdKind, add a corresponding
+ * structure encompassing the information that's required to be shared across
+ * different functions. The way it works is as follows: in the caller, fill in
+ * the information into one of below structures based on the command kind, pass
+ * the command kind and a pointer to the filled in structure as a void pointer
+ * to required functions, say ExecInitParallelPlan. The called functions will
+ * use command kind to dereference the void pointer to corresponding structure.
+ *
+ * This way, the functions that are needed for parallel insertions can be
+ * generic, clean and extensible.
+ */
+typedef struct ParallelInsertCTASInfo
+{
+ IntoClause *intoclause;
+ Oid objectid;
+} ParallelInsertCTASInfo;
+
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
@@ -47,5 +79,11 @@ extern void ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei, Bitmapset *sendParam);
extern void ParallelQueryMain(dsm_segment *seg, shm_toc *toc);
-
+extern ParallelInsertCmdKind GetParallelInsertCmdType(DestReceiver *dest);
+extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
+ ParallelInsertCmdKind ins_cmd);
+extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
+ QueryDesc *queryDesc);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..297b3ff728 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v19-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v19-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 889de831a66c8f4a97d7822e0aac34e470dc7b60 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:13:23 +0530
Subject: [PATCH v19 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 13 ++++-
src/backend/commands/explain.c | 22 +++++++--
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execParallel.c | 70 +++++++++++++++++++++------
src/backend/optimizer/path/costsize.c | 20 +++++++-
src/backend/optimizer/plan/planner.c | 40 +++++++++++++++
src/include/commands/explain.h | 3 +-
src/include/executor/execParallel.h | 22 ++++++++-
src/include/nodes/parsenodes.h | 2 +
src/include/optimizer/planner.h | 10 ++++
10 files changed, 182 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index a8050a2767..53ca3010c6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -310,6 +310,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -342,7 +352,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc,
+ &query->parallelInsCmdTupleCostOpt);
}
/* run the plan to completion */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0ae5d8c65f..8e01faba7e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -383,11 +383,25 @@ ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
if (es->buffers)
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -403,7 +417,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->parallelInsCmdTupleCostOpt);
}
}
@@ -513,7 +528,8 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -590,7 +606,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc, parallel_ins_tuple_cost_opts);
}
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..696d3343d4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 7ed3e9e3b6..2846df66e6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1755,7 +1755,8 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
* node so that the required information will be picked and sent to workers.
*/
void
-SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts)
{
GatherState *gstate;
DestReceiver *dest;
@@ -1766,24 +1767,63 @@ SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
+ if (tuple_cost_opts && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED));
/*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is into
- * clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver, store
- * a reference to it in the Gather state so that it will be used in
- * ExecInitParallelPlan to pick the information.
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_opts)
+ *tuple_cost_opts = 0;
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..d79842dbf3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -393,7 +394,24 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_TUP_COST_IGNORED;
+ }
+ else
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..d1b7347de2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,36 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for commands in
+ * which parallel insertion is possible and we are generating an upper level
+ * Gather path.
+ */
+static void
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return;
+
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST;
+ }
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,7 +7588,16 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for commands in which parallel
+ * insertion is possible and we are generating an upper level Gather
+ * path.
+ */
+ ignore_parallel_tuple_cost(root);
generate_useful_gather_paths(root, rel, false);
+ }
/*
* Reassess which paths are the cheapest, now that we've potentially added
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index e94d9e49cf..1a75c3ced3 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 689f577c08..f76b5c2ffd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -49,6 +49,25 @@ typedef enum ParallelInsertCmdKind
PARALLEL_INSERT_CMD_CREATE_TABLE_AS
} ParallelInsertCmdKind;
+/*
+ * Information sent to planner to account for tuple cost calculations in
+ * cost_gather for parallel insertions in commands such as CTAS.
+ *
+ * We need to let the planner know that there will be no tuples received by
+ * Gather node if workers insert the tuples in parallel.
+ */
+typedef enum ParallelInsertCmdTupleCostOpt
+{
+ PARALLEL_INSERT_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+} ParallelInsertCmdTupleCostOpt;
+
/*
* For each of the command added to ParallelInsertCmdKind, add a corresponding
* structure encompassing the information that's required to be shared across
@@ -85,5 +104,6 @@ extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
void *ins_info);
extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a0f37e5268..a1d2cb9d4f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,8 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ /* Parallel insertion tuple cost options. */
+ uint8 parallelInsCmdTupleCostOpt;
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 9a15de5025..b71d21d334 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v19-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v19-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From 61641393e4c1d90a47f2a070d6e9e020e6f014e4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 16:49:43 +0530
Subject: [PATCH v19 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 559 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 211 +++++++
3 files changed, 796 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..38a18c5a9b 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,563 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..40aadafc2a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,215 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v19-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v19-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 83b8b55c979a83440b4135e6a755331343585870 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 4 Jan 2021 13:24:24 +0530
Subject: [PATCH v19 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/executor/execParallel.c | 152 ++--
src/backend/optimizer/path/allpaths.c | 31 +
src/backend/optimizer/plan/planner.c | 12 +-
src/include/executor/execParallel.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1086 insertions(+), 57 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 2846df66e6..5f298c4328 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -151,6 +151,9 @@ static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
ParallelInsertCmdKind ins_cmd,
void *ins_info);
+static bool PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd,
+ bool *gather_exists);
/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
@@ -1748,6 +1751,84 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
return false;
}
+/*
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In this function we only care about Append and Gather nodes. This function
+ * returns true if at least one Gather node can allow parallel insertions by
+ * the workers. Otherwise returns false. It also sets gather_exists to true if
+ * at least one Gather node exists.
+ */
+static bool
+PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd, bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownParallelInsertState(dest, aps->appendplans[i],
+ ins_cmd, gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+ }
+ }
+
+ return parallel;
+}
+
/*
* Set the parallel insert state, if the upper node is Gather and it doesn't
* have any projections. The parallel insert state includes information such as
@@ -1758,67 +1839,32 @@ void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
uint8 *tuple_cost_opts)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
-
- /*
- * Parallel insertions are possible only if the upper node is Gather.
- */
- if (!IsA(gstate, GatherState))
- return;
-
- if (tuple_cost_opts && gstate->ps.ps_ProjInfo)
- Assert(!(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED));
+ allow = PushDownParallelInsertState(queryDesc->dest, queryDesc->planstate,
+ ins_cmd, &gather_exists);
/*
- * Parallelize inserts only when the upper Gather node has no projections.
+ * When parallel insertion is not allowed, before returning ensure that
+ * we have not done wrong parallel tuple cost enforcement in the planner.
+ * Main reason for this assertion is to check if we enforced the planner to
+ * ignore the parallel tuple cost (with the intention of choosing parallel
+ * inserts) due to which the parallel plan may have been chosen, but we do
+ * not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's try to catch that here.
*/
- if (!gstate->ps.ps_ProjInfo)
+ if (!allow && gather_exists)
{
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is
- * into clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver,
- * store a reference to it in the Gather state so that it will be used
- * in ExecInitParallelPlan to pick the information.
- */
- gstate->dest = dest;
- }
- else
- {
- /*
- * Upper Gather node has projections, so parallel insertions are not
- * allowed.
- */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = false;
-
- gstate->dest = NULL;
-
/*
- * Before returning, ensure that we have not done wrong parallel tuple
- * cost enforcement in the planner. Main reason for this assertion is
- * to check if we enforced the planner to ignore the parallel tuple
- * cost (with the intention of choosing parallel inserts) due to which
- * the parallel plan may have been chosen, but we do not allow the
- * parallel inserts now.
- *
- * If we have correctly ignored parallel tuple cost in the planner
- * while creating Gather path, then this assertion failure should not
- * occur. In case it occurs, that means the planner may have chosen
- * this parallel plan because of our wrong enforcement. So let's try to
- * catch that here.
+ * Parallel insertion is not allowed, but gather node exists, check if
+ * we have done wrong tuple cost enforcement.
*/
Assert(tuple_cost_opts && !(*tuple_cost_opts &
PARALLEL_INSERT_TUP_COST_IGNORED));
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..96b5ce81c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "executor/execParallel.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d1b7347de2..423619735b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7351,9 +7351,15 @@ can_partial_agg(PlannerInfo *root)
static void
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->parallelInsCmdTupleCostOpt &
- PARALLEL_INSERT_SELECT_QUERY))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_SELECT_QUERY;
+ }
+
+ if (root->parse->parallelInsCmdTupleCostOpt & PARALLEL_INSERT_SELECT_QUERY)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f76b5c2ffd..41f116bbf5 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -65,7 +65,9 @@ typedef enum ParallelInsertCmdTupleCostOpt
*/
PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
/* Turn on this after the cost is ignored. */
- PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND = 1 << 3
} ParallelInsertCmdTupleCostOpt;
/*
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 38a18c5a9b..356a2d0002 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -631,6 +631,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 40aadafc2a..32e6ad8636 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -246,6 +246,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com> wrote:
Sorry it took so long to get back to reviewing this.
Thanks for the comments.
wrt v18-0001....patch:
+ /* + * If the worker is for parallel insert in CTAS, then use
the proper
+ * dest receiver. + */ + intoclause = (IntoClause *) stringToNode(intoclausestr); + receiver = CreateIntoRelDestReceiver(intoclause); + ((DR_intorel *)receiver)->is_parallel_worker = true; + ((DR_intorel *)receiver)->object_id = fpes->objectid; I would move this into a function called e.g. GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in createas.c. I would then also split up intorel_startup into intorel_leader_startup and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set self->pub.rStartup to intorel_worker_startup.
My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does the
minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the v19
patch posted upthread [1]/messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com.
+ volatile pg_atomic_uint64 *processed;
why is it volatile?
Intention is to always read from the actual memory location. I referred it
from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their u32
counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
+ if (isctas) + { + intoclause = ((DR_intorel *)
node->dest)->into;
+ objectid = ((DR_intorel *)
node->dest)->object_id;
+ }
Given that you extract them each once and then pass them directly into
the parallel-worker, can't you instead pass in the destreceiver and
leave that logic to ExecInitParallelPlan?
That's changed entirely in the v19 patch set posted upthread [1]/messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com. Please
have a look. I didn't pass the dest receiver, to keep the API generic, I
passed parallel insert command type and a void * ptr which points to
insertion command because the information we pass to workers depends on the
insertion command (for instance, the information needed by workers is for
CTAS into clause and object id and for Refresh Mat View object id).
+ if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *)
gstate->dest)->into->rel &&
+ ((DR_intorel *)
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is always done?
Actually, that into clause rel variable is always being set in the gram.y
for CTAS, Create Materialized View and SELECT INTO (because qualified_name
non-terminal is not optional). My bad. I just added it as a sanity check.
Actually, it's not required.
create_as_target:
*qualified_name* opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause
opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
* $$->rel = $2;*
I will change the below code:
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+ ((DR_intorel *) gstate->dest)->into &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {
to:
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
I will update this in the next version of the patch set.
+ * In case if no workers were launched, allow the leader to
insert entire
+ * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?
Yeah, noticed that while working on the v19 patch set. Please have a look
at the v19 patch posted upthread [1]/messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com.
================
wrt v18-0003....patch:not sure if it is needed, but i was wondering if we would want more
tests with multiple gather nodes existing? caused e.g. by using CTE's,
valid subquery's (like the one test you have, but without the group
by/having)?
I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH
RECURSIVE and I don't see that any of the WITH clause processing hits
createas.c functions. So, IMHO, we don't need to add them. Please let me
know if there are any specific use cases you have in mind.
For instance, I tried to cover Init/Sub Plan and Subquery cases with:
below case has multiple Gather, Init Plan:
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
below case has Gather, Sub Plan:
+-- parallel inserts must not occur, as there is sub plan that gets
executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
For multiple Gather node cases, I covered them with the Union All/Append
cases in the 0004 patch. Please have a look.
[1]: /messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com
/messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc@swarm64.com> wrote:
On 04-01-2021 12:16, Hou, Zhijie wrote:
================
wrt v18-0002....patch:It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.
Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.
IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.
I may be wrong. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) why would rel and relname not be there? if no rows have been inserted? because it seems from the intorel_startup function that that would be set as soon as startup was done, which i assume (wrongly?) is always done?Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.
create_as_target:
qualified_name opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
create_mv_target:
qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
$$->rel = $2;I will change the below code: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && + ((DR_intorel *) gstate->dest)->into && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) + {to: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + {I will update this in the next version of the patch set.
Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 876672ce9ba140fe0c86507d9d7ff655b200a8f1 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 5 Jan 2021 09:03:30 +0530
Subject: [PATCH v20 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 28 ++-
src/backend/commands/createas.c | 84 ++++++-
src/backend/commands/explain.c | 44 ++++
src/backend/executor/execParallel.c | 322 ++++++++++++++++++++++++-
src/backend/executor/nodeGather.c | 130 +++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 16 ++
src/include/executor/execParallel.h | 42 +++-
src/include/nodes/execnodes.h | 3 +
11 files changed, 637 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..3741d824bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..750d15a572 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..a8050a2767 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
#include "commands/prepare.h"
#include "commands/tablecmds.h"
#include "commands/view.h"
+#include "executor/execParallel.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -51,18 +52,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -294,6 +283,11 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
}
else
{
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
/*
* Parse analysis was done already, but we still have to run the rule
* rewriter. We do not do AcquireRewriteLocks: we assume the query
@@ -338,6 +332,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -441,6 +448,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -461,6 +471,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -563,6 +602,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..e985ea6db3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execParallel.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -572,6 +573,27 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1791,6 +1813,28 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..7ed3e9e3b6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,10 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ ParallelInsertCmdKind ins_cmd_type; /* parallel insertion command type */
+ Oid objectid; /* used by workers to open relation */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -135,10 +141,23 @@ static bool ExecParallelReInitializeDSM(PlanState *planstate,
ParallelContext *pcxt);
static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
SharedExecutorInstrumentation *instrumentation);
-
-/* Helper function that runs in the parallel worker. */
+static void ParallelInsCmdEstimate(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+
+/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static DestReceiver *ExecParallelGetInsReceiver(shm_toc *toc,
+ FixedParallelExecutorState *fpes);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -578,7 +597,9 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -712,6 +733,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for parallel insertions. */
+ if (parallel_ins_info)
+ ParallelInsCmdEstimate(pcxt, parallel_ins_cmd, parallel_ins_info);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +754,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion fixed info into DSA. */
+ SaveParallelInsCmdFixedInfo(pei, fpes, parallel_ins_cmd,
+ parallel_ins_info);
+ }
+ else
+ {
+ pei->processed = NULL;
+ fpes->ins_cmd_type = PARALLEL_INSERT_CMD_UNDEF;
+ fpes->objectid = InvalidOid;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +797,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion info into DSA. */
+ SaveParallelInsCmdInfo(pcxt, parallel_ins_cmd, parallel_ins_info);
+
+ /*
+ * Tuple queues are not required in case of parallel insertions by the
+ * workers, because Gather node will not receive any tuples.
+ */
+ pei->tqueue = NULL;
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1391,8 +1444,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ /* Set up DestReceiver. */
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
+ else
+ receiver = ExecParallelGetReceiver(seg, toc);
+
+ /* Set up SharedExecutorInstrumentation, and QueryDesc. */
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1529,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
@@ -1479,3 +1544,246 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
FreeQueryDesc(queryDesc);
receiver->rDestroy(receiver);
}
+
+/*
+ * Estimate space required for sending parallel insert information to workers
+ * in commands such as CTAS.
+ */
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len = 0;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * Save fixed state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pei && fpes && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ pg_atomic_init_u64(&fpes->processed, 0);
+ fpes->ins_cmd_type = ins_cmd;
+ pei->processed = &fpes->processed;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ fpes->objectid = info->objectid;
+ }
+}
+
+/*
+ * Save variable state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len;
+ char *intoclause_space = NULL;
+
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclause_str, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+}
+
+/*
+ * Create a DestReceiver to write produced tuples to target relation in case of
+ * parallel insertions.
+ */
+static DestReceiver *
+ExecParallelGetInsReceiver(shm_toc *toc, FixedParallelExecutorState *fpes)
+{
+ ParallelInsertCmdKind ins_cmd;
+ DestReceiver *receiver;
+
+ Assert(fpes && toc &&
+ (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ ins_cmd = fpes->ins_cmd_type;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ char *intoclause_str = NULL;
+ IntoClause *intoclause = NULL;
+
+ intoclause_str = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclause_str);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+
+ return receiver;
+}
+
+/*
+ * Given a DestReceiver, return the command type if parallelism is allowed.
+ */
+ParallelInsertCmdKind
+GetParallelInsertCmdType(DestReceiver *dest)
+{
+ if (!dest)
+ return PARALLEL_INSERT_CMD_UNDEF;
+
+ if (dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel)
+ return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
+
+ return PARALLEL_INSERT_CMD_UNDEF;
+}
+
+/*
+ * Given a DestReceiver, allocate and fill parallel insert info structure
+ * corresponding to command type.
+ *
+ * Note that the memory allocated here for the info structure has to be freed
+ * up in caller.
+ */
+void *
+GetParallelInsertCmdInfo(DestReceiver *dest, ParallelInsertCmdKind ins_cmd)
+{
+ void *parallel_ins_info = NULL;
+
+ Assert(dest && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *)
+ palloc0(sizeof(ParallelInsertCTASInfo));
+ ctas_info->intoclause = ((DR_intorel *) dest)->into;
+ ctas_info->objectid = ((DR_intorel *) dest)->object_id;
+ parallel_ins_info = ctas_info;
+ }
+
+ return parallel_ins_info;
+}
+
+/*
+ * Check if parallel insertion is allowed in commands such as CTAS.
+ *
+ * Return true if allowed, otherwise false.
+ */
+bool
+IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
+{
+ Assert(ins_info && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ /*
+ * For CTAS, do not allow parallel inserts if target table is temporary. As
+ * the temporary tables are backend local, workers can not know about them.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+ IntoClause *into = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *) ins_info;
+ into = ctas_info->intoclause;
+
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!(into && IsA(into, IntoClause)))
+ return false;
+
+ /*
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Set the parallel insert state, if the upper node is Gather and it doesn't
+ * have any projections. The parallel insert state includes information such as
+ * a flag in the dest receiver and also a dest receiver reference in the Gather
+ * node so that the required information will be picked and sent to workers.
+ */
+void
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is into
+ * clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver, store
+ * a reference to it in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..1ab3e0f600 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,6 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsert(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +132,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsert(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for commands such as CREATE TABLE AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsert(GatherState *node)
+{
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target relation. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert all
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed +=
+ pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +224,17 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ ParallelInsertCmdKind parallel_ins_cmd;
+ bool perform_parallel_ins = false;
+
+ /*
+ * Get the parallel insert command type from the dest receiver which
+ * would have been set in SetParallelInsertState().
+ */
+ parallel_ins_cmd = GetParallelInsertCmdType(node->dest);
+
+ if (parallel_ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ perform_parallel_ins = true;
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +243,15 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ void *parallel_ins_info = NULL;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in commands such as CTAS.
+ */
+ if (perform_parallel_ins)
+ parallel_ins_info = GetParallelInsertCmdInfo(node->dest,
+ parallel_ins_cmd);
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +259,9 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ parallel_ins_cmd,
+ parallel_ins_info);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +279,22 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ /*
+ * Do not create tuple queue readers for commands with parallel
+ * insertion. Because the gather node will not receive any
+ * tuples, the workers will insert the tuples into the target
+ * relation.
+ */
+ if (!perform_parallel_ins)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -205,12 +303,24 @@ ExecGather(PlanState *pstate)
node->reader = NULL;
}
node->nextreader = 0;
+
+ /* Free up the parallel insert info, if allocated. */
+ if (parallel_ins_info)
+ pfree(parallel_ins_info);
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
- || (!gather->single_copy && parallel_leader_participation);
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !perform_parallel_ins) || (!gather->single_copy &&
+ parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for commands such as CTAS. */
+ if (perform_parallel_ins)
+ {
+ ExecParallelInsert(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..ea72473c8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ 0,
+ NULL);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f49a57b35e..4cd6f972ed 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ad5054d116..74022aab41 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..689f577c08 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -14,6 +14,7 @@
#define EXECPARALLEL_H
#include "access/parallel.h"
+#include "executor/execdesc.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
@@ -35,11 +36,42 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
+
+/*
+ * For each of the command added to ParallelInsertCmdKind, add a corresponding
+ * structure encompassing the information that's required to be shared across
+ * different functions. The way it works is as follows: in the caller, fill in
+ * the information into one of below structures based on the command kind, pass
+ * the command kind and a pointer to the filled in structure as a void pointer
+ * to required functions, say ExecInitParallelPlan. The called functions will
+ * use command kind to dereference the void pointer to corresponding structure.
+ *
+ * This way, the functions that are needed for parallel insertions can be
+ * generic, clean and extensible.
+ */
+typedef struct ParallelInsertCTASInfo
+{
+ IntoClause *intoclause;
+ Oid objectid;
+} ParallelInsertCTASInfo;
+
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
@@ -47,5 +79,11 @@ extern void ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei, Bitmapset *sendParam);
extern void ParallelQueryMain(dsm_segment *seg, shm_toc *toc);
-
+extern ParallelInsertCmdKind GetParallelInsertCmdType(DestReceiver *dest);
+extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
+ ParallelInsertCmdKind ins_cmd);
+extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
+ QueryDesc *queryDesc);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..297b3ff728 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 889de831a66c8f4a97d7822e0aac34e470dc7b60 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:13:23 +0530
Subject: [PATCH v20 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can set the number of tuples transferred from the
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
---
src/backend/commands/createas.c | 13 ++++-
src/backend/commands/explain.c | 22 +++++++--
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execParallel.c | 70 +++++++++++++++++++++------
src/backend/optimizer/path/costsize.c | 20 +++++++-
src/backend/optimizer/plan/planner.c | 40 +++++++++++++++
src/include/commands/explain.h | 3 +-
src/include/executor/execParallel.h | 22 ++++++++-
src/include/nodes/parsenodes.h | 2 +
src/include/optimizer/planner.h | 10 ++++
10 files changed, 182 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index a8050a2767..53ca3010c6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -310,6 +310,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -342,7 +352,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc,
+ &query->parallelInsCmdTupleCostOpt);
}
/* run the plan to completion */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0ae5d8c65f..8e01faba7e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -383,11 +383,25 @@ ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
if (es->buffers)
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -403,7 +417,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->parallelInsCmdTupleCostOpt);
}
}
@@ -513,7 +528,8 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -590,7 +606,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc, parallel_ins_tuple_cost_opts);
}
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..696d3343d4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 7ed3e9e3b6..2846df66e6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1755,7 +1755,8 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
* node so that the required information will be picked and sent to workers.
*/
void
-SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts)
{
GatherState *gstate;
DestReceiver *dest;
@@ -1766,24 +1767,63 @@ SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
+ if (tuple_cost_opts && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED));
/*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is into
- * clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver, store
- * a reference to it in the Gather state so that it will be used in
- * ExecInitParallelPlan to pick the information.
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
+
+ if (tuple_cost_opts)
+ *tuple_cost_opts = 0;
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..d79842dbf3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -393,7 +394,24 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_TUP_COST_IGNORED;
+ }
+ else
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..d1b7347de2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,36 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for commands in
+ * which parallel insertion is possible and we are generating an upper level
+ * Gather path.
+ */
+static void
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return;
+
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST;
+ }
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,7 +7588,16 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for commands in which parallel
+ * insertion is possible and we are generating an upper level Gather
+ * path.
+ */
+ ignore_parallel_tuple_cost(root);
generate_useful_gather_paths(root, rel, false);
+ }
/*
* Reassess which paths are the cheapest, now that we've potentially added
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index e94d9e49cf..1a75c3ced3 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 689f577c08..f76b5c2ffd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -49,6 +49,25 @@ typedef enum ParallelInsertCmdKind
PARALLEL_INSERT_CMD_CREATE_TABLE_AS
} ParallelInsertCmdKind;
+/*
+ * Information sent to planner to account for tuple cost calculations in
+ * cost_gather for parallel insertions in commands such as CTAS.
+ *
+ * We need to let the planner know that there will be no tuples received by
+ * Gather node if workers insert the tuples in parallel.
+ */
+typedef enum ParallelInsertCmdTupleCostOpt
+{
+ PARALLEL_INSERT_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+} ParallelInsertCmdTupleCostOpt;
+
/*
* For each of the command added to ParallelInsertCmdKind, add a corresponding
* structure encompassing the information that's required to be shared across
@@ -85,5 +104,6 @@ extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
void *ins_info);
extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a0f37e5268..a1d2cb9d4f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,8 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ /* Parallel insertion tuple cost options. */
+ uint8 parallelInsCmdTupleCostOpt;
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 9a15de5025..b71d21d334 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v20-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v20-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From 61641393e4c1d90a47f2a070d6e9e020e6f014e4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 28 Dec 2020 16:49:43 +0530
Subject: [PATCH v20 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 559 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 211 +++++++
3 files changed, 796 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..38a18c5a9b 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,563 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: 40kB
+ Worker 0: Batches: 1 Memory Usage: 40kB
+ Worker 1: Batches: 1 Memory Usage: 40kB
+ Worker 2: Batches: 1 Memory Usage: 40kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: 25kB
+ Worker 0: Sort Method: quicksort Memory: 25kB
+ Worker 1: Sort Method: quicksort Memory: 25kB
+ Worker 2: Sort Method: quicksort Memory: 25kB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: 25kB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: 64kB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..40aadafc2a 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,215 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v20-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/x-patch; name=v20-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 83b8b55c979a83440b4135e6a755331343585870 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 4 Jan 2021 13:24:24 +0530
Subject: [PATCH v20 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/executor/execParallel.c | 152 ++--
src/backend/optimizer/path/allpaths.c | 31 +
src/backend/optimizer/plan/planner.c | 12 +-
src/include/executor/execParallel.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1086 insertions(+), 57 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 2846df66e6..5f298c4328 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -151,6 +151,9 @@ static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
ParallelInsertCmdKind ins_cmd,
void *ins_info);
+static bool PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd,
+ bool *gather_exists);
/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
@@ -1748,6 +1751,84 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
return false;
}
+/*
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In this function we only care about Append and Gather nodes. This function
+ * returns true if at least one Gather node can allow parallel insertions by
+ * the workers. Otherwise returns false. It also sets gather_exists to true if
+ * at least one Gather node exists.
+ */
+static bool
+PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd, bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownParallelInsertState(dest, aps->appendplans[i],
+ ins_cmd, gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+ }
+ }
+
+ return parallel;
+}
+
/*
* Set the parallel insert state, if the upper node is Gather and it doesn't
* have any projections. The parallel insert state includes information such as
@@ -1758,67 +1839,32 @@ void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
uint8 *tuple_cost_opts)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
-
- /*
- * Parallel insertions are possible only if the upper node is Gather.
- */
- if (!IsA(gstate, GatherState))
- return;
-
- if (tuple_cost_opts && gstate->ps.ps_ProjInfo)
- Assert(!(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED));
+ allow = PushDownParallelInsertState(queryDesc->dest, queryDesc->planstate,
+ ins_cmd, &gather_exists);
/*
- * Parallelize inserts only when the upper Gather node has no projections.
+ * When parallel insertion is not allowed, before returning ensure that
+ * we have not done wrong parallel tuple cost enforcement in the planner.
+ * Main reason for this assertion is to check if we enforced the planner to
+ * ignore the parallel tuple cost (with the intention of choosing parallel
+ * inserts) due to which the parallel plan may have been chosen, but we do
+ * not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's try to catch that here.
*/
- if (!gstate->ps.ps_ProjInfo)
+ if (!allow && gather_exists)
{
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is
- * into clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver,
- * store a reference to it in the Gather state so that it will be used
- * in ExecInitParallelPlan to pick the information.
- */
- gstate->dest = dest;
- }
- else
- {
- /*
- * Upper Gather node has projections, so parallel insertions are not
- * allowed.
- */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = false;
-
- gstate->dest = NULL;
-
/*
- * Before returning, ensure that we have not done wrong parallel tuple
- * cost enforcement in the planner. Main reason for this assertion is
- * to check if we enforced the planner to ignore the parallel tuple
- * cost (with the intention of choosing parallel inserts) due to which
- * the parallel plan may have been chosen, but we do not allow the
- * parallel inserts now.
- *
- * If we have correctly ignored parallel tuple cost in the planner
- * while creating Gather path, then this assertion failure should not
- * occur. In case it occurs, that means the planner may have chosen
- * this parallel plan because of our wrong enforcement. So let's try to
- * catch that here.
+ * Parallel insertion is not allowed, but gather node exists, check if
+ * we have done wrong tuple cost enforcement.
*/
Assert(tuple_cost_opts && !(*tuple_cost_opts &
PARALLEL_INSERT_TUP_COST_IGNORED));
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..96b5ce81c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "executor/execParallel.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d1b7347de2..423619735b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7351,9 +7351,15 @@ can_partial_agg(PlannerInfo *root)
static void
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->parallelInsCmdTupleCostOpt &
- PARALLEL_INSERT_SELECT_QUERY))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_SELECT_QUERY;
+ }
+
+ if (root->parse->parallelInsCmdTupleCostOpt & PARALLEL_INSERT_SELECT_QUERY)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f76b5c2ffd..41f116bbf5 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -65,7 +65,9 @@ typedef enum ParallelInsertCmdTupleCostOpt
*/
PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
/* Turn on this after the cost is ignored. */
- PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND = 1 << 3
} ParallelInsertCmdTupleCostOpt;
/*
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 38a18c5a9b..356a2d0002 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -631,6 +631,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: 217kB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 40aadafc2a..32e6ad8636 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -246,6 +246,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
Few comments:
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a
parallel worker")));Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.Thoughts?
Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.
+ Oid objectid; /* workers to open relation/table. */ + /* Number of tuples inserted by all the workers. */ + pg_atomic_uint64 processed;We can just mention relation instead of relation/table.
I will modify it in the next patch set.
+select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); + explain_pictas +---------------------------------------------------------- + Gather (actual rows=N loops=N) + Workers Planned: 4 + Workers Launched: N + -> Create parallel_write + -> Parallel Seq Scan on tenk1 (actual rows=N loops=N) +(5 rows) + +select count(*) from parallel_write;Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.
I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM t1) as dt;
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On 04-01-2021 14:32, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com
<mailto:luc@swarm64.com>> wrote:Sorry it took so long to get back to reviewing this.
Thanks for the comments.
wrt v18-0001....patch:
+ /* + * If the worker is for parallel insert in CTAS, thenuse the proper
+ * dest receiver. + */ + intoclause = (IntoClause *) stringToNode(intoclausestr); + receiver = CreateIntoRelDestReceiver(intoclause); + ((DR_intorel *)receiver)->is_parallel_worker = true; + ((DR_intorel *)receiver)->object_id = fpes->objectid; I would move this into a function called e.g. GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in createas.c. I would then also split up intorel_startup into intorel_leader_startup and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set self->pub.rStartup to intorel_worker_startup.My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does
the minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the
v19 patch posted upthread [1].
Looks much better, really nicely abstracted away in the v20 patch.
+ volatile pg_atomic_uint64 *processed;
why is it volatile?Intention is to always read from the actual memory location. I referred
it from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
Okay I had not seen this syntax before for atomics with the volatile
keyword but its apparently how the atomics abstraction works in postgresql.
+ if (isctas) + { + intoclause = ((DR_intorel *)node->dest)->into;
+ objectid = ((DR_intorel *)
node->dest)->object_id;
+ }
Given that you extract them each once and then pass them directly into
the parallel-worker, can't you instead pass in the destreceiver and
leave that logic to ExecInitParallelPlan?That's changed entirely in the v19 patch set posted upthread [1]. Please
have a look. I didn't pass the dest receiver, to keep the API generic, I
passed parallel insert command type and a void * ptr which points to
insertion command because the information we pass to workers depends on
the insertion command (for instance, the information needed by workers
is for CTAS into clause and object id and for Refresh Mat View object id).+ if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *)
gstate->dest)->into->rel &&
+ ((DR_intorel *)
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is alwaysdone?
Actually, that into clause rel variable is always being set in the
gram.y for CTAS, Create Materialized View and SELECT INTO (because
qualified_name non-terminal is not optional). My bad. I just added it as
a sanity check. Actually, it's not required.create_as_target:
*qualified_name* opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause
opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
* $$->rel = $2;*I will change the below code: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && + ((DR_intorel *) gstate->dest)->into && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) + {to: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + {I will update this in the next version of the patch set.
Thanks
+ * In case if no workers were launched, allow the leader to
insert entire
+ * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?Yeah, noticed that while working on the v19 patch set. Please have a
look at the v19 patch posted upthread [1].================
wrt v18-0003....patch:not sure if it is needed, but i was wondering if we would want more
tests with multiple gather nodes existing? caused e.g. by using CTE's,
valid subquery's (like the one test you have, but without the group
by/having)?I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH
RECURSIVE and I don't see that any of the WITH clause processing hits
createas.c functions. So, IMHO, we don't need to add them. Please let me
know if there are any specific use cases you have in mind.For instance, I tried to cover Init/Sub Plan and Subquery cases with:
below case has multiple Gather, Init Plan: +-- parallel inserts must occur, as there is init plan that gets executed by +-- each parallel worker +select explain_pictas( +'create table parallel_write as select two col1, + (select two from (select * from tenk2) as tt limit 1) col2 + from tenk1 where tenk1.four = 3;');below case has Gather, Sub Plan: +-- parallel inserts must not occur, as there is sub plan that gets executed by +-- the Gather node in leader +select explain_pictas( +'create table parallel_write as select two col1, + (select tenk1.two from generate_series(1,1)) col2 + from tenk1 where tenk1.four = 3;');For multiple Gather node cases, I covered them with the Union All/Append
cases in the 0004 patch. Please have a look.
Right, had not reviewed part 4 yet. My bad.
[1] -
/messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com
</messages/by-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk+zmg@mail.gmail.com>With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Kind regards,
Luc
On 05-01-2021 04:59, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) why would rel and relname not be there? if no rows have been inserted? because it seems from the intorel_startup function that that would be set as soon as startup was done, which i assume (wrongly?) is always done?Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.
create_as_target:
qualified_name opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
create_mv_target:
qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
$$->rel = $2;I will change the below code: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && + ((DR_intorel *) gstate->dest)->into && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) + {to: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + {I will update this in the next version of the patch set.
Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Reviewing further v20-0001:
I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.
Maybe I'm not completely following why but afaics we want parallel
inserts in various scenarios, not just CTAS? I'm asking because code like
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed,
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that
would be generalized soon after? Basically I would have expected the if
to compare against PARALLEL_INSERT_CMD_UNDEF.
Apart from these small things v20-0001 looks (very) good to me.
v20-0002:
will reply on the specific mail-thread about the state machine
v20-0003 and v20-0004:
looks good to me.
Kind regards,
Luc
On 04-01-2021 14:53, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc@swarm64.com> wrote:
On 04-01-2021 12:16, Hou, Zhijie wrote:
================
wrt v18-0002....patch:It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.I may be wrong. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this
would make the parallel inserts conceptually invalid.
Right now we're trying to judge what might be added on-top that could
change the rows by inspecting all parts of the root object that would
cause anything to be added, and add a little statemachine to track the
state of that knowledge. To me this has the downside that the list in
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
make sure it stays up-to-date, which could result in regressions if not
tracked carefully.
Personally I would therefore go for a design which is safe in the sense
that regressions are not as easily introduced. IMHO that could be done
by inspecting the planned query afterwards, and then judging whether or
not the parallel inserts are actually the right thing to do.
Another way to create more safety against regressions would be to add an
assert upon execution of the query that if we do parallel inserts that
only a subset of allowed nodes exists above the gather node.
Some (not extremely fact checked) approaches as food for thought:
1. Plan the query as normal, and then afterwards look at the resulting
plan to see if there are only nodes that are ok between the gather node
and the top node, which afaics would only be things like append nodes.
Which would mean two things:
- at the end of subquery_planner before the final_rel is fetched, we add
another pass like the grouping_planner called e.g.
parallel_modify_planner or so, which traverses the query plan and checks
if the inserts would indeed be executed parallel, and if so sets the
cost of the gather to 0.
- we always keep around the best gathered partial path, or the partial
path itself.
2. Generate both gather paths: one with zero cost for the inserts and
one with costs. the one with zero costs would however be kept separately
and added as prime candidate for the final rel. then we can check in the
subquery_planner if the final candidate is different and then choose.
Kind regards,
Luc
On Tue, Jan 5, 2021 at 10:08 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
Few comments:
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a
parallel worker")));Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.Thoughts?
Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.
We can skip the error "cannot insert tuples in a parallel worker" in
heap_prepare_insert() selectively for each parallel insertion and
eventually we can remove that error after all the parallel insertion
related patches are committed. The main problem is that we should be
knowing in heap_prepare_insert() that we are coming from parallel
insertion for CTAS, or some other command at the same time we don't
want to alter the table_tuple_insert()/heap_prepare_insert() API
because this change will be removed eventually.
We can achieve this in below ways:
1) Add a backend global variable, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error.
2) Add a variable to MyBgworkerEntry structure, set it before each
table_tuple_insert() in intorel_receive() or in ParallelQueryMain() if
we are for CTAS parallel insertion and use that in
heap_prepare_insert() to skip the error.
3) Currently, we pass table insert options to
table_tuple_insert()/heap_prepare_insert(), which is a bitmap of below
values. We could also add something like #define
PARALLEL_INSERTION_CMD_CTAS 0x000F, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error, then unset it.
/* "options" flag bits for table_tuple_insert */
/* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
#define TABLE_INSERT_SKIP_FSM 0x0002
#define TABLE_INSERT_FROZEN 0x0004
#define TABLE_INSERT_NO_LOGICAL 0x0008
IMO either 2 or 3 would be fine. Thoughts?
+ Oid objectid; /* workers to open relation/table. */ + /* Number of tuples inserted by all the workers. */ + pg_atomic_uint64 processed;We can just mention relation instead of relation/table.
I will modify it in the next patch set.
+select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); + explain_pictas +---------------------------------------------------------- + Gather (actual rows=N loops=N) + Workers Planned: 4 + Workers Launched: N + -> Create parallel_write + -> Parallel Seq Scan on tenk1 (actual rows=N loops=N) +(5 rows) + +select count(*) from parallel_write;Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM t1) as dt;
Thanks.
So, the expectation is that the above query should always return 1 if
both leader and workers shared the same txn. I will add this to one of
the test cases in the next version of the patch set.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming <luc@swarm64.com> wrote:
On 04-01-2021 14:32, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com
<mailto:luc@swarm64.com>> wrote:Sorry it took so long to get back to reviewing this.
Thanks for the comments.
wrt v18-0001....patch:
+ /* + * If the worker is for parallel insert in CTAS, thenuse the proper
+ * dest receiver. + */ + intoclause = (IntoClause *) stringToNode(intoclausestr); + receiver = CreateIntoRelDestReceiver(intoclause); + ((DR_intorel *)receiver)->is_parallel_worker = true; + ((DR_intorel *)receiver)->object_id = fpes->objectid; I would move this into a function called e.g. GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in createas.c. I would then also split up intorel_startup into intorel_leader_startup and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set self->pub.rStartup to intorel_worker_startup.My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does
the minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the
v19 patch posted upthread [1].Looks much better, really nicely abstracted away in the v20 patch.
+ volatile pg_atomic_uint64 *processed;
why is it volatile?Intention is to always read from the actual memory location. I referred
it from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
But in your case, I do not understand the intention that where do you
think that the compiler can optimize it and read the old value?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On 05-01-2021 11:32, Dilip Kumar wrote:
On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming <luc@swarm64.com> wrote:
On 04-01-2021 14:32, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com
<mailto:luc@swarm64.com>> wrote:Sorry it took so long to get back to reviewing this.
Thanks for the comments.
wrt v18-0001....patch:
+ /* + * If the worker is for parallel insert in CTAS, thenuse the proper
+ * dest receiver. + */ + intoclause = (IntoClause *) stringToNode(intoclausestr); + receiver = CreateIntoRelDestReceiver(intoclause); + ((DR_intorel *)receiver)->is_parallel_worker = true; + ((DR_intorel *)receiver)->object_id = fpes->objectid; I would move this into a function called e.g. GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in createas.c. I would then also split up intorel_startup into intorel_leader_startup and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set self->pub.rStartup to intorel_worker_startup.My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does
the minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the
v19 patch posted upthread [1].Looks much better, really nicely abstracted away in the v20 patch.
+ volatile pg_atomic_uint64 *processed;
why is it volatile?Intention is to always read from the actual memory location. I referred
it from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.But in your case, I do not understand the intention that where do you
think that the compiler can optimize it and read the old value?
It can not and should not. I had just only seen so far c++ atomic
variables and not a (postgres-specific?) c atomic variable which
apparently requires the volatile keyword. My stupidity ;)
Cheers,
Luc
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc@swarm64.com> wrote:
Reviewing further v20-0001:
I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.
If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;
IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.
If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.
Thoughts?
Maybe I'm not completely following why but afaics we want parallel inserts in various scenarios, not just CTAS? I'm asking because code like + if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed); seems very specific to CTAS. For now that seems fine but I suppose that would be generalized soon after? Basically I would have expected the if to compare against PARALLEL_INSERT_CMD_UNDEF.
After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1]/messages/by-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd=+KcaA8X=kStORhxpfpODOxg@mail.gmail.com as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.
Thoughts?
[1]: /messages/by-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd=+KcaA8X=kStORhxpfpODOxg@mail.gmail.com
Apart from these small things v20-0001 looks (very) good to me.
v20-0003 and v20-0004:
looks good to me.
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
ParallelInsCmdEstimate :
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.
Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo
Cheers
On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:+ if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *)
gstate->dest)->into->rel &&
+ ((DR_intorel *)
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is alwaysdone?
Actually, that into clause rel variable is always being set in the
gram.y for CTAS, Create Materialized View and SELECT INTO (because
qualified_name non-terminal is not optional). My bad. I just added it as a
sanity check. Actually, it's not required.create_as_target:
qualified_name opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
create_mv_target:
qualified_name opt_column_list table_access_method_clauseopt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
$$->rel = $2;I will change the below code: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && + ((DR_intorel *) gstate->dest)->into && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *)gstate->dest)->into->rel->relname)
+ {
to: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + {I will update this in the next version of the patch set.
Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu <zyu@yugabyte.com> wrote:
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
ParallelInsCmdEstimate :
+ Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)Sinc the if condition is covered by the assertion, I wonder why the if check is still needed.
Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo
Thanks.
The idea is to have assertion with all the expected ins_cmd types, and
then later to have selective handling for different ins_cmds. For
example, if we add (in future) parallel insertion in Refresh
Materialized View, then the code in those functions will be something
like:
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
+ (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+
+ }
+ else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+ {
+
+ }
Similarly for other functions as well.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
The plan sounds good.
Before the second command type is added, can you leave out the 'if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?
You can add the if condition back when the second command type is added.
Cheers
On Tue, Jan 5, 2021 at 7:53 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu <zyu@yugabyte.com> wrote:
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
ParallelInsCmdEstimate :
+ Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.
Similar comment for SaveParallelInsCmdFixedInfo and
SaveParallelInsCmdInfo
Thanks.
The idea is to have assertion with all the expected ins_cmd types, and
then later to have selective handling for different ins_cmds. For
example, if we add (in future) parallel insertion in Refresh
Materialized View, then the code in those functions will be something
like:+static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) +{ + Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS || + (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + { + + } + else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) + { + + }Similarly for other functions as well.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
I see there is some code that is generic for CTAS and INSERT INTO
SELECT *, So is it
possible to take out that common code to a separate base patch? Later
both CTAS and INSERT INTO SELECT * can expand
that for their usage.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
ParallelInsCmdEstimate :
+ Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.
Similar comment for SaveParallelInsCmdFixedInfo and
SaveParallelInsCmdInfoThanks.
The idea is to have assertion with all the expected ins_cmd types, and then
later to have selective handling for different ins_cmds. For example, if
we add (in future) parallel insertion in Refresh Materialized View, then
the code in those functions will be something
like:+static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) +{ + Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS || + (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + { + + } + else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) + { + + }Similarly for other functions as well.
I think it makes sense.
And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in some places,
How about define a generic function with some comment to mention the purpose.
An example in INSERT INTO SELECT patch:
+/*
+ * IsModifySupportedInParallelMode
+ *
+ * Indicates whether execution of the specified table-modification command
+ * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain
+ * parallel-safety conditions.
+ */
+static inline bool
+IsModifySupportedInParallelMode(CmdType commandType)
+{
+ /* Currently only INSERT is supported */
+ return (commandType == CMD_INSERT);
+}
Best regards,
houzj
On Wed, Jan 6, 2021 at 10:05 AM Zhihong Yu <zyu@yugabyte.com> wrote:
The plan sounds good.
Before the second command type is added, can you leave out the 'if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?
You can add the if condition back when the second command type is added.
Thanks.
IMO, an empty pair of curlies is not a good idea. Having if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS) doesn't harm anything.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 11:06 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
ParallelInsCmdEstimate :
+ Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.
Similar comment for SaveParallelInsCmdFixedInfo and
SaveParallelInsCmdInfoThanks.
The idea is to have assertion with all the expected ins_cmd types, and then
later to have selective handling for different ins_cmds. For example, if
we add (in future) parallel insertion in Refresh Materialized View, then
the code in those functions will be something
like:+static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) +{ + Assert(pcxt && ins_info && + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS || + (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)); + + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + { + + } + else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) + { + + }Similarly for other functions as well.
I think it makes sense.
And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in some places,
How about define a generic function with some comment to mention the purpose.An example in INSERT INTO SELECT patch: +/* + * IsModifySupportedInParallelMode + * + * Indicates whether execution of the specified table-modification command + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain + * parallel-safety conditions. + */ +static inline bool +IsModifySupportedInParallelMode(CmdType commandType) +{ + /* Currently only INSERT is supported */ + return (commandType == CMD_INSERT); +}
The intention of assert is to verify that those functions are called
for appropriate commands such as CTAS, Refresh Mat View and so on with
correct parameters. I really don't think so we can replace the assert
with a function like above, in the release mode assertion will always
be true. In a way, that assertion is for only debugging purposes. And
I also think that when we as the callers know when to call those new
functions, we can even remove the assertions, if they are really a
problem here. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:+/* + * List the commands here for which parallel insertions are possible. + */ +typedef enum ParallelInsertCmdKind +{ + PARALLEL_INSERT_CMD_UNDEF = 0, + PARALLEL_INSERT_CMD_CREATE_TABLE_AS +} ParallelInsertCmdKind;I see there is some code that is generic for CTAS and INSERT INTO
SELECT *, So is it
possible to take out that common code to a separate base patch? Later
both CTAS and INSERT INTO SELECT * can expand
that for their usage.
I currently see the common code for parallel inserts i.e. insert into
selects, copy, ctas/create mat view/refresh mat view is the code in -
heapam.c, xact.c and xact.h. I can make a separate patch if required
for these changes alone. Thoughts?
IIRC parallel inserts in insert into select and copy don't use the
design idea of pushing the dest receiver down to Gather. Whereas
ctas/create mat view, refresh mat view, copy to can use the idea of
pushing the dest receiver to Gather and can easily extend on the
patches I made here.
Is there anything else do you feel that we can have in common?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
I think it makes sense.
And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
used in some places, How about define a generic function with some commentto mention the purpose.
An example in INSERT INTO SELECT patch: +/* + * IsModifySupportedInParallelMode + * + * Indicates whether execution of the specified table-modification +command + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to +certain + * parallel-safety conditions. + */ +static inline bool +IsModifySupportedInParallelMode(CmdType commandType) { + /* Currently only INSERT is supported */ + return (commandType == CMD_INSERT); }The intention of assert is to verify that those functions are called for
appropriate commands such as CTAS, Refresh Mat View and so on with correct
parameters. I really don't think so we can replace the assert with a function
like above, in the release mode assertion will always be true. In a way,
that assertion is for only debugging purposes. And I also think that when
we as the callers know when to call those new functions, we can even remove
the assertions, if they are really a problem here. Thoughts?
Hi
Thanks for the explanation.
If the check about command type is only used in assert, I think you are right.
I suggested a new function because I guess the check can be used in some other places.
Such as:
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
Or
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
If you think the above code will extend the ins_cmd type check in the future, the generic function may make sense.
Best regards,
houzj
On Wed, Jan 6, 2021 at 11:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:+/* + * List the commands here for which parallel insertions are possible. + */ +typedef enum ParallelInsertCmdKind +{ + PARALLEL_INSERT_CMD_UNDEF = 0, + PARALLEL_INSERT_CMD_CREATE_TABLE_AS +} ParallelInsertCmdKind;I see there is some code that is generic for CTAS and INSERT INTO
SELECT *, So is it
possible to take out that common code to a separate base patch? Later
both CTAS and INSERT INTO SELECT * can expand
that for their usage.I currently see the common code for parallel inserts i.e. insert into
selects, copy, ctas/create mat view/refresh mat view is the code in -
heapam.c, xact.c and xact.h. I can make a separate patch if required
for these changes alone. Thoughts?
I just saw this structure (ParallelInsertCmdKind) where it is defining
the ParallelInsertCmdKind and also usage is different based on the
command type. So I think the code which is defining the generic code
e.g. this structure and other similar code can go to the first patch
and we can build the remaining patch atop that patch. But if you
think this is just this structure and not much code is common then we
can let it be.
IIRC parallel inserts in insert into select and copy don't use the
design idea of pushing the dest receiver down to Gather. Whereas
ctas/create mat view, refresh mat view, copy to can use the idea of
pushing the dest receiver to Gather and can easily extend on the
patches I made here.Is there anything else do you feel that we can have in common?
Nothing specific.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 11:30 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I think it makes sense.
And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
used in some places, How about define a generic function with some commentto mention the purpose.
An example in INSERT INTO SELECT patch: +/* + * IsModifySupportedInParallelMode + * + * Indicates whether execution of the specified table-modification +command + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to +certain + * parallel-safety conditions. + */ +static inline bool +IsModifySupportedInParallelMode(CmdType commandType) { + /* Currently only INSERT is supported */ + return (commandType == CMD_INSERT); }The intention of assert is to verify that those functions are called for
appropriate commands such as CTAS, Refresh Mat View and so on with correct
parameters. I really don't think so we can replace the assert with a function
like above, in the release mode assertion will always be true. In a way,
that assertion is for only debugging purposes. And I also think that when
we as the callers know when to call those new functions, we can even remove
the assertions, if they are really a problem here. Thoughts?Hi
Thanks for the explanation.
If the check about command type is only used in assert, I think you are right.
I suggested a new function because I guess the check can be used in some other places.
Such as:+ /* Okay to parallelize inserts, so mark it. */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = true;+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = false;
We need to know exactly what is the command in above place, to
dereference and mark is_parallel to true, because is_parallel is being
added to the respective structures, not to the generic _DestReceiver
structure. So, in future the above code becomes something like below:
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+ else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+ ((DR_transientrel *) dest)->is_parallel = true;
+ else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO)
+ ((DR_copy *) dest)->is_parallel = true;
In the below place, instead of new function, I think we can just have
something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
Or
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);If you think the above code will extend the ins_cmd type check in the future, the generic function may make sense.
We can also change below to fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF.
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
If okay, I will modify it in the next version of the patch.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 05-01-2021 13:57, Bharath Rupireddy wrote:
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc@swarm64.com> wrote:
Reviewing further v20-0001:
I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.
The current way is good enough, it was a suggestion as personally I find
it hard to read to have two completely separate code paths in the same
function. If any I would opt for something like 3) where there's a
CreateIntoRelParallelDestReceiver which calls CreateIntoRelDestReceiver
and then overrides rStartup to intorel_startup_worker. Then no callsites
have to change except the ones that are for parallel workers.
If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.Thoughts?
Maybe I'm not completely following why but afaics we want parallel inserts in various scenarios, not just CTAS? I'm asking because code like + if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed); seems very specific to CTAS. For now that seems fine but I suppose that would be generalized soon after? Basically I would have expected the if to compare against PARALLEL_INSERT_CMD_UNDEF.After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.Thoughts?
Sounds good
[1] - /messages/by-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd=+KcaA8X=kStORhxpfpODOxg@mail.gmail.com
Apart from these small things v20-0001 looks (very) good to me.
v20-0003 and v20-0004:
looks good to me.Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Kind regards,
Luc
+ /* Okay to parallelize inserts, so mark it. */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = true;+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = false;We need to know exactly what is the command in above place, to dereference
and mark is_parallel to true, because is_parallel is being added to the
respective structures, not to the generic _DestReceiver structure. So, in
future the above code becomes something like below:+ /* Okay to parallelize inserts, so mark it. */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = true; + else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) + ((DR_transientrel *) dest)->is_parallel = true; + else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO) + ((DR_copy *) dest)->is_parallel = true;In the below place, instead of new function, I think we can just have
something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)Or
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + pg_atomic_add_fetch_u64(&fpes->processed, + queryDesc->estate->es_processed);If you think the above code will extend the ins_cmd type check in the
future, the generic function may make sense.
We can also change below to fpes->ins_cmd_type !=
PARALLEL_INSERT_CMD_UNDEF.+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + receiver = ExecParallelGetInsReceiver(toc, fpes);If okay, I will modify it in the next version of the patch.
Yes, that looks good to me.
Best regards,
houzj
On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming <luc@swarm64.com> wrote:
wrt v18-0002....patch:
It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.I may be wrong. Thoughts?
So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this
would make the parallel inserts conceptually invalid.
Right.
Right now we're trying to judge what might be added on-top that could
change the rows by inspecting all parts of the root object that would
cause anything to be added, and add a little statemachine to track the
state of that knowledge. To me this has the downside that the list in
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
make sure it stays up-to-date, which could result in regressions if not
tracked carefully.
Right. Any new clause that will be added which generates an upper path
in grouping_planner after apply_scanjoin_target_to_paths also needs to
be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
ignore the parallel tuple cost because of which the parallel plan may
be chosen and we go for parallel inserts only when the top node is
Gather. I don't think any new clause that will be added generates a
new upper Gather node in grouping_planner after
apply_scanjoin_target_to_paths.
Personally I would therefore go for a design which is safe in the sense
that regressions are not as easily introduced. IMHO that could be done
by inspecting the planned query afterwards, and then judging whether or
not the parallel inserts are actually the right thing to do.
The 0001 patch does that. It doesn't have any influence on the planner
for parallel tuple cost calculation, it just looks at the generated
plan and decides on parallel inserts. Having said that, we might miss
parallel plans even though we know that there will not be tuples
transferred from workers to Gather. So, 0002 patch adds the code for
influencing the planner for parallel tuple cost.
Another way to create more safety against regressions would be to add an
assert upon execution of the query that if we do parallel inserts that
only a subset of allowed nodes exists above the gather node.
Yes, we already do this. Please have a look at
SetParallelInsertState() in the 0002 patch. The idea is that in any
case, if the planner ignored the tuple cost, but we later not allow
parallel inserts either due to the upper node is not Gather or Gather
with projections. The assertion fails. So, in case any new parent path
generating clause is added (apart from the ones that are there in
HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
this Assert will catch it. Currently, I couldn't find any assertion
failures in my debug build with make check and make check world.
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.
It would be nice if the scenarios where parallel plan is not chosen are
listed.
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+
PARALLEL_INSERT_TUP_COST_IGNORED;
If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY
and PARALLEL_INSERT_CAN_IGN_TUP_COST are
set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.
Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting
(1) of the first two bits should suffice.
Cheers
On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:+ if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *)
gstate->dest)->into->rel &&
+ ((DR_intorel *)
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is alwaysdone?
Actually, that into clause rel variable is always being set in the
gram.y for CTAS, Create Materialized View and SELECT INTO (because
qualified_name non-terminal is not optional). My bad. I just added it as a
sanity check. Actually, it's not required.create_as_target:
qualified_name opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
create_mv_target:
qualified_name opt_column_list table_access_method_clauseopt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
$$->rel = $2;I will change the below code: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && + ((DR_intorel *) gstate->dest)->into && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *)gstate->dest)->into->rel->relname)
+ {
to: + if (GetParallelInsertCmdType(gstate->dest) == + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + {I will update this in the next version of the patch set.
Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.It would be nice if the scenarios where a parallel plan is not chosen are listed.
There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.
Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
+ if (!IsA(gstate, GatherState))
return;
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_CAN_IGN_TUP_COST)) + { + /* We are ignoring the parallel tuple cost, so mark it. */ + root->parse->parallelInsCmdTupleCostOpt |= + PARALLEL_INSERT_TUP_COST_IGNORED;If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.
Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice.
The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.
void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
ListCell *lc;
double rows;
double *rowsp = NULL;
List *useful_pathkeys_list = NIL;
Path *cheapest_partial_path = NULL;
/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;
/* Should we override the rel's rowcount estimate? */
if (override_rows)
rowsp = &rows;
/* generate the regular gather (merge) paths */
generate_gather_paths(root, rel, override_rows);
void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
Path *cheapest_partial_path;
Path *simple_gather_path;
ListCell *lc;
double rows;
double *rowsp = NULL;
/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks for the clarification.
w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.
Probably rephrase that sentence a little bit to make this clearer.
Cheers
On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.It would be nice if the scenarios where a parallel plan is not chosen
are listed.
There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.+ * Parallel insertions are possible only if the upper node is Gather. */ + if (!IsA(gstate, GatherState)) return;+ * Parallelize inserts only when the upper Gather node has no projections. */ + if (!gstate->ps.ps_ProjInfo) + { + /* Okay to parallelize inserts, so mark it. */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = true; + + /* + * For parallelizing inserts, we must send some information so that the + * workers can build their own dest receivers. For CTAS, this info is + * into clause, object id (to open the created table). + * + * Since the required information is available in the dest receiver, + * store a reference to it in the Gather state so that it will be used + * in ExecInitParallelPlan to pick the information. + */ + gstate->dest = dest; + } + else + { + /* + * Upper Gather node has projections, so parallel insertions are not + * allowed. + */+ if ((root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_CAN_IGN_TUP_COST)) + { + /* We are ignoring the parallel tuple cost, so mark it. */ + root->parse->parallelInsCmdTupleCostOpt |= +PARALLEL_INSERT_TUP_COST_IGNORED;
If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
is implied.Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
setting (1) of the first two bits should suffice.
The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
ListCell *lc;
double rows;
double *rowsp = NULL;
List *useful_pathkeys_list = NIL;
Path *cheapest_partial_path = NULL;/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;/* Should we override the rel's rowcount estimate? */
if (override_rows)
rowsp = &rows;/* generate the regular gather (merge) paths */
generate_gather_paths(root, rel, override_rows);void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
Path *cheapest_partial_path;
Path *simple_gather_path;
ListCell *lc;
double rows;
double *rowsp = NULL;/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Attaching v21 patch set, which has following changes:
1) 0001 - changed fpes->ins_cmd_type ==
PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
PARALLEL_INSERT_CMD_UNDEF
2) 0002 - reworded the commit message.
3) 0003 - added cmin, xmin test case to one of the parallel insert
cases to ensure leader and worker insert the tuples in the same xact
and replaced memory usage output in numbers like 25kB to NkB to make
the tests stable.
4) 0004 - updated one of the test output to be in NkB and made the
assertion in SetParallelInsertState to be not under an if condition.
There's one open point [1]/messages/by-id/CALj2ACXmbka1P5pxOV2vU-Go3UPTtsPqZXE8nKW1mE49MQcZtw@mail.gmail.com on selective skipping of error "cannot
insert tuples in a parallel worker" in heap_prepare_insert(), thoughts
are welcome.
Please consider the v21 patch set for further review.
[1]: /messages/by-id/CALj2ACXmbka1P5pxOV2vU-Go3UPTtsPqZXE8nKW1mE49MQcZtw@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v21-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v21-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From c6934ac1476735cd455cc0b3123504ca51aec6f2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 09:58:59 +0530
Subject: [PATCH v21 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 28 ++-
src/backend/commands/createas.c | 84 ++++++-
src/backend/commands/explain.c | 44 ++++
src/backend/executor/execParallel.c | 322 ++++++++++++++++++++++++-
src/backend/executor/nodeGather.c | 130 +++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 16 ++
src/include/executor/execParallel.h | 42 +++-
src/include/nodes/execnodes.h | 3 +
11 files changed, 637 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..3741d824bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..750d15a572 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..a8050a2767 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
#include "commands/prepare.h"
#include "commands/tablecmds.h"
#include "commands/view.h"
+#include "executor/execParallel.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -51,18 +52,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -294,6 +283,11 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
}
else
{
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
/*
* Parse analysis was done already, but we still have to run the rule
* rewriter. We do not do AcquireRewriteLocks: we assume the query
@@ -338,6 +332,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -441,6 +448,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -461,6 +471,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -563,6 +602,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..e985ea6db3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execParallel.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -572,6 +573,27 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1791,6 +1813,28 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..ba4508c409 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,10 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ ParallelInsertCmdKind ins_cmd_type; /* parallel insertion command type */
+ Oid objectid; /* used by workers to open relation */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -135,10 +141,23 @@ static bool ExecParallelReInitializeDSM(PlanState *planstate,
ParallelContext *pcxt);
static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
SharedExecutorInstrumentation *instrumentation);
-
-/* Helper function that runs in the parallel worker. */
+static void ParallelInsCmdEstimate(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+
+/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static DestReceiver *ExecParallelGetInsReceiver(shm_toc *toc,
+ FixedParallelExecutorState *fpes);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -578,7 +597,9 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -712,6 +733,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for parallel insertions. */
+ if (parallel_ins_info)
+ ParallelInsCmdEstimate(pcxt, parallel_ins_cmd, parallel_ins_info);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +754,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion fixed info into DSA. */
+ SaveParallelInsCmdFixedInfo(pei, fpes, parallel_ins_cmd,
+ parallel_ins_info);
+ }
+ else
+ {
+ pei->processed = NULL;
+ fpes->ins_cmd_type = PARALLEL_INSERT_CMD_UNDEF;
+ fpes->objectid = InvalidOid;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +797,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion info into DSA. */
+ SaveParallelInsCmdInfo(pcxt, parallel_ins_cmd, parallel_ins_info);
+
+ /*
+ * Tuple queues are not required in case of parallel insertions by the
+ * workers, because Gather node will not receive any tuples.
+ */
+ pei->tqueue = NULL;
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1391,8 +1444,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ /* Set up DestReceiver. */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
+ else
+ receiver = ExecParallelGetReceiver(seg, toc);
+
+ /* Set up SharedExecutorInstrumentation, and QueryDesc. */
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1529,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
@@ -1479,3 +1544,246 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
FreeQueryDesc(queryDesc);
receiver->rDestroy(receiver);
}
+
+/*
+ * Estimate space required for sending parallel insert information to workers
+ * in commands such as CTAS.
+ */
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len = 0;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * Save fixed state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pei && fpes && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ pg_atomic_init_u64(&fpes->processed, 0);
+ fpes->ins_cmd_type = ins_cmd;
+ pei->processed = &fpes->processed;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ fpes->objectid = info->objectid;
+ }
+}
+
+/*
+ * Save variable state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len;
+ char *intoclause_space = NULL;
+
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclause_str, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+}
+
+/*
+ * Create a DestReceiver to write produced tuples to target relation in case of
+ * parallel insertions.
+ */
+static DestReceiver *
+ExecParallelGetInsReceiver(shm_toc *toc, FixedParallelExecutorState *fpes)
+{
+ ParallelInsertCmdKind ins_cmd;
+ DestReceiver *receiver;
+
+ Assert(fpes && toc &&
+ (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ ins_cmd = fpes->ins_cmd_type;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ char *intoclause_str = NULL;
+ IntoClause *intoclause = NULL;
+
+ intoclause_str = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclause_str);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+
+ return receiver;
+}
+
+/*
+ * Given a DestReceiver, return the command type if parallelism is allowed.
+ */
+ParallelInsertCmdKind
+GetParallelInsertCmdType(DestReceiver *dest)
+{
+ if (!dest)
+ return PARALLEL_INSERT_CMD_UNDEF;
+
+ if (dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel)
+ return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
+
+ return PARALLEL_INSERT_CMD_UNDEF;
+}
+
+/*
+ * Given a DestReceiver, allocate and fill parallel insert info structure
+ * corresponding to command type.
+ *
+ * Note that the memory allocated here for the info structure has to be freed
+ * up in caller.
+ */
+void *
+GetParallelInsertCmdInfo(DestReceiver *dest, ParallelInsertCmdKind ins_cmd)
+{
+ void *parallel_ins_info = NULL;
+
+ Assert(dest && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *)
+ palloc0(sizeof(ParallelInsertCTASInfo));
+ ctas_info->intoclause = ((DR_intorel *) dest)->into;
+ ctas_info->objectid = ((DR_intorel *) dest)->object_id;
+ parallel_ins_info = ctas_info;
+ }
+
+ return parallel_ins_info;
+}
+
+/*
+ * Check if parallel insertion is allowed in commands such as CTAS.
+ *
+ * Return true if allowed, otherwise false.
+ */
+bool
+IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
+{
+ Assert(ins_info && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ /*
+ * For CTAS, do not allow parallel inserts if target table is temporary. As
+ * the temporary tables are backend local, workers can not know about them.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+ IntoClause *into = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *) ins_info;
+ into = ctas_info->intoclause;
+
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!(into && IsA(into, IntoClause)))
+ return false;
+
+ /*
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Set the parallel insert state, if the upper node is Gather and it doesn't
+ * have any projections. The parallel insert state includes information such as
+ * a flag in the dest receiver and also a dest receiver reference in the Gather
+ * node so that the required information will be picked and sent to workers.
+ */
+void
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is into
+ * clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver, store
+ * a reference to it in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..1ab3e0f600 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,6 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsert(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +132,72 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsert(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for commands such as CREATE TABLE AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsert(GatherState *node)
+{
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target relation. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert all
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ node->ps.state->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed +=
+ pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +224,17 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ ParallelInsertCmdKind parallel_ins_cmd;
+ bool perform_parallel_ins = false;
+
+ /*
+ * Get the parallel insert command type from the dest receiver which
+ * would have been set in SetParallelInsertState().
+ */
+ parallel_ins_cmd = GetParallelInsertCmdType(node->dest);
+
+ if (parallel_ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ perform_parallel_ins = true;
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +243,15 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ void *parallel_ins_info = NULL;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in commands such as CTAS.
+ */
+ if (perform_parallel_ins)
+ parallel_ins_info = GetParallelInsertCmdInfo(node->dest,
+ parallel_ins_cmd);
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +259,9 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ parallel_ins_cmd,
+ parallel_ins_info);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +279,22 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ /*
+ * Do not create tuple queue readers for commands with parallel
+ * insertion. Because the gather node will not receive any
+ * tuples, the workers will insert the tuples into the target
+ * relation.
+ */
+ if (!perform_parallel_ins)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -205,12 +303,24 @@ ExecGather(PlanState *pstate)
node->reader = NULL;
}
node->nextreader = 0;
+
+ /* Free up the parallel insert info, if allocated. */
+ if (parallel_ins_info)
+ pfree(parallel_ins_info);
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
- || (!gather->single_copy && parallel_leader_participation);
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !perform_parallel_ins) || (!gather->single_copy &&
+ parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for commands such as CTAS. */
+ if (perform_parallel_ins)
+ {
+ ExecParallelInsert(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..ea72473c8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ 0,
+ NULL);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f49a57b35e..4cd6f972ed 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ad5054d116..74022aab41 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..689f577c08 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -14,6 +14,7 @@
#define EXECPARALLEL_H
#include "access/parallel.h"
+#include "executor/execdesc.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
@@ -35,11 +36,42 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
+
+/*
+ * For each of the command added to ParallelInsertCmdKind, add a corresponding
+ * structure encompassing the information that's required to be shared across
+ * different functions. The way it works is as follows: in the caller, fill in
+ * the information into one of below structures based on the command kind, pass
+ * the command kind and a pointer to the filled in structure as a void pointer
+ * to required functions, say ExecInitParallelPlan. The called functions will
+ * use command kind to dereference the void pointer to corresponding structure.
+ *
+ * This way, the functions that are needed for parallel insertions can be
+ * generic, clean and extensible.
+ */
+typedef struct ParallelInsertCTASInfo
+{
+ IntoClause *intoclause;
+ Oid objectid;
+} ParallelInsertCTASInfo;
+
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
@@ -47,5 +79,11 @@ extern void ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei, Bitmapset *sendParam);
extern void ParallelQueryMain(dsm_segment *seg, shm_toc *toc);
-
+extern ParallelInsertCmdKind GetParallelInsertCmdType(DestReceiver *dest);
+extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
+ ParallelInsertCmdKind ins_cmd);
+extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
+ QueryDesc *queryDesc);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..297b3ff728 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v21-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v21-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From 7706545576f486a92adc9568ae3042c396a52b68 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 11:24:17 +0530
Subject: [PATCH v21 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can ignore parallel tuple cost in case the workers can
insert the tuples in parallel. This is okay because the Gather
node will not actually receive any tuples. With the planner ignoring
the parallel tuple cost, there are chances that the planner may
choose the parallel plan which otherwise would have been costly
compared with the non-parallel plans and ignored.
---
src/backend/commands/createas.c | 13 +++++-
src/backend/commands/explain.c | 22 +++++++--
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execParallel.c | 66 ++++++++++++++++++++-------
src/backend/optimizer/path/costsize.c | 20 +++++++-
src/backend/optimizer/plan/planner.c | 40 ++++++++++++++++
src/include/commands/explain.h | 3 +-
src/include/executor/execParallel.h | 22 ++++++++-
src/include/nodes/parsenodes.h | 2 +
src/include/optimizer/planner.h | 10 ++++
10 files changed, 177 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index a8050a2767..53ca3010c6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -310,6 +310,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -342,7 +352,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc,
+ &query->parallelInsCmdTupleCostOpt);
}
/* run the plan to completion */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e985ea6db3..d7da07d4f6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -383,11 +383,25 @@ ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
if (es->buffers)
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -403,7 +417,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->parallelInsCmdTupleCostOpt);
}
}
@@ -513,7 +528,8 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -590,7 +606,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc, parallel_ins_tuple_cost_opts);
}
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..696d3343d4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ba4508c409..a26c9cdac8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1755,7 +1755,8 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
* node so that the required information will be picked and sent to workers.
*/
void
-SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts)
{
GatherState *gstate;
DestReceiver *dest;
@@ -1766,24 +1767,57 @@ SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
/*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is into
- * clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver, store
- * a reference to it in the Gather state so that it will be used in
- * ExecInitParallelPlan to pick the information.
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..d79842dbf3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -393,7 +394,24 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_TUP_COST_IGNORED;
+ }
+ else
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..d1b7347de2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,36 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for commands in
+ * which parallel insertion is possible and we are generating an upper level
+ * Gather path.
+ */
+static void
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return;
+
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST;
+ }
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,7 +7588,16 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for commands in which parallel
+ * insertion is possible and we are generating an upper level Gather
+ * path.
+ */
+ ignore_parallel_tuple_cost(root);
generate_useful_gather_paths(root, rel, false);
+ }
/*
* Reassess which paths are the cheapest, now that we've potentially added
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index e94d9e49cf..1a75c3ced3 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 689f577c08..f76b5c2ffd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -49,6 +49,25 @@ typedef enum ParallelInsertCmdKind
PARALLEL_INSERT_CMD_CREATE_TABLE_AS
} ParallelInsertCmdKind;
+/*
+ * Information sent to planner to account for tuple cost calculations in
+ * cost_gather for parallel insertions in commands such as CTAS.
+ *
+ * We need to let the planner know that there will be no tuples received by
+ * Gather node if workers insert the tuples in parallel.
+ */
+typedef enum ParallelInsertCmdTupleCostOpt
+{
+ PARALLEL_INSERT_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+} ParallelInsertCmdTupleCostOpt;
+
/*
* For each of the command added to ParallelInsertCmdKind, add a corresponding
* structure encompassing the information that's required to be shared across
@@ -85,5 +104,6 @@ extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
void *ins_info);
extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dc2bb40926..70a78b169b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,8 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ /* Parallel insertion tuple cost options. */
+ uint8 parallelInsCmdTupleCostOpt;
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 9a15de5025..b71d21d334 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v21-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v21-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From f4eeed8cdefa14364749d72129158a2d217f2b80 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 10:43:47 +0530
Subject: [PATCH v21 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 568 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 215 +++++++
3 files changed, 809 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..fb111973e9 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,572 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ Worker 0: Batches: 1 Memory Usage: NkB
+ Worker 1: Batches: 1 Memory Usage: NkB
+ Worker 2: Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: NkB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..d33e98def8 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,219 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v21-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/x-patch; name=v21-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From ffbd77fefee51f6caecbbca6ae38c4ce7fef4886 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 11:36:20 +0530
Subject: [PATCH v21 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/executor/execParallel.c | 154 ++--
src/backend/optimizer/path/allpaths.c | 31 +
src/backend/optimizer/plan/planner.c | 12 +-
src/include/executor/execParallel.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1089 insertions(+), 56 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a26c9cdac8..63fec33e80 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -151,6 +151,9 @@ static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
ParallelInsertCmdKind ins_cmd,
void *ins_info);
+static bool PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd,
+ bool *gather_exists);
/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
@@ -1748,6 +1751,84 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
return false;
}
+/*
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In this function we only care about Append and Gather nodes. This function
+ * returns true if at least one Gather node can allow parallel insertions by
+ * the workers. Otherwise returns false. It also sets gather_exists to true if
+ * at least one Gather node exists.
+ */
+static bool
+PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd, bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownParallelInsertState(dest, aps->appendplans[i],
+ ins_cmd, gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+ }
+ }
+
+ return parallel;
+}
+
/*
* Set the parallel insert state, if the upper node is Gather and it doesn't
* have any projections. The parallel insert state includes information such as
@@ -1758,66 +1839,35 @@ void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
uint8 *tuple_cost_opts)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
+ allow = PushDownParallelInsertState(queryDesc->dest, queryDesc->planstate,
+ ins_cmd, &gather_exists);
/*
- * Parallel insertions are possible only if the upper node is Gather.
+ * If parallel insertion is allowed or not allowed due to non-existence of
+ * Gather node, then return from here without going for below assertion
+ * check.
*/
- if (!IsA(gstate, GatherState))
+ if (allow || !gather_exists)
return;
/*
- * Parallelize inserts only when the upper Gather node has no projections.
+ * When parallel insertion is not allowed but Gather node exists, before
+ * returning ensure that we have not done wrong parallel tuple cost
+ * enforcement in the planner. Main reason for this assertion is to check
+ * if we enforced the planner to ignore the parallel tuple cost (with the
+ * intention of choosing parallel inserts) due to which the parallel plan
+ * may have been chosen, but we do not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's try to catch that here.
*/
- if (!gstate->ps.ps_ProjInfo)
- {
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is
- * into clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver,
- * store a reference to it in the Gather state so that it will be used
- * in ExecInitParallelPlan to pick the information.
- */
- gstate->dest = dest;
- }
- else
- {
- /*
- * Upper Gather node has projections, so parallel insertions are not
- * allowed.
- */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = false;
-
- gstate->dest = NULL;
-
- /*
- * Before returning, ensure that we have not done wrong parallel tuple
- * cost enforcement in the planner. Main reason for this assertion is
- * to check if we enforced the planner to ignore the parallel tuple
- * cost (with the intention of choosing parallel inserts) due to which
- * the parallel plan may have been chosen, but we do not allow the
- * parallel inserts now.
- *
- * If we have correctly ignored parallel tuple cost in the planner
- * while creating Gather path, then this assertion failure should not
- * occur. In case it occurs, that means the planner may have chosen
- * this parallel plan because of our wrong enforcement. So let's try to
- * catch that here.
- */
- Assert(tuple_cost_opts && !(*tuple_cost_opts &
- PARALLEL_INSERT_TUP_COST_IGNORED));
- }
+ Assert(!allow && gather_exists && tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..96b5ce81c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "executor/execParallel.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d1b7347de2..423619735b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7351,9 +7351,15 @@ can_partial_agg(PlannerInfo *root)
static void
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->parallelInsCmdTupleCostOpt &
- PARALLEL_INSERT_SELECT_QUERY))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_SELECT_QUERY;
+ }
+
+ if (root->parse->parallelInsCmdTupleCostOpt & PARALLEL_INSERT_SELECT_QUERY)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f76b5c2ffd..41f116bbf5 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -65,7 +65,9 @@ typedef enum ParallelInsertCmdTupleCostOpt
*/
PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
/* Turn on this after the cost is ignored. */
- PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND = 1 << 3
} ParallelInsertCmdTupleCostOpt;
/*
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index fb111973e9..33358a86c9 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -640,6 +640,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index d33e98def8..8e3dcb7280 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -250,6 +250,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
Attaching v21 patch set, which has following changes:
1) 0001 - changed fpes->ins_cmd_type ==
PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
PARALLEL_INSERT_CMD_UNDEF
2) 0002 - reworded the commit message.
3) 0003 - added cmin, xmin test case to one of the parallel insert cases
to ensure leader and worker insert the tuples in the same xact and replaced
memory usage output in numbers like 25kB to NkB to make the tests stable.
4) 0004 - updated one of the test output to be in NkB and made the assertion
in SetParallelInsertState to be not under an if condition.There's one open point [1] on selective skipping of error "cannot insert
tuples in a parallel worker" in heap_prepare_insert(), thoughts are
welcome.Please consider the v21 patch set for further review.
Hi,
I took a look into the new patch and have some comments.
1.
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.
2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
...
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
...
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?
3.
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
...
+ node->ps.state->es_processed++;
+ }
How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;
Best regards,
houzj
On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Attaching v21 patch set, which has following changes:
1) 0001 - changed fpes->ins_cmd_type ==
PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
PARALLEL_INSERT_CMD_UNDEF
2) 0002 - reworded the commit message.
3) 0003 - added cmin, xmin test case to one of the parallel insert cases
to ensure leader and worker insert the tuples in the same xact and replaced
memory usage output in numbers like 25kB to NkB to make the tests stable.
4) 0004 - updated one of the test output to be in NkB and made the assertion
in SetParallelInsertState to be not under an if condition.There's one open point [1] on selective skipping of error "cannot insert
tuples in a parallel worker" in heap_prepare_insert(), thoughts are
welcome.Please consider the v21 patch set for further review.
Hi,
I took a look into the new patch and have some comments.
Thanks.
1. + /* + * Do not consider tuple cost in case of we intend to perform parallel + * inserts by workers. We would have turned on the ignore flag in + * apply_scanjoin_target_to_paths before generating Gather path for the + * upper level SELECT part of the query. + */ + if ((root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & + PARALLEL_INSERT_CAN_IGN_TUP_COST))Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.
+1. Changed.
2. +static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *) ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1;+static void +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *)ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1; + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?
I wanted to keep the API generic, not do nodeToString, strlen outside
and pass it to the APIs. I don't think it will add too much function
call cost since it's run only in the leader. This way, the code and
API looks more readable. Thoughts?
3. + if (node->need_to_scan_locally || node->nworkers_launched == 0) + { + EState *estate = node->ps.state; + TupleTableSlot *outerTupleSlot; + + for(;;) + { + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = + node->pei ? node->pei->area : NULL; ... + node->ps.state->es_processed++; + }How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;
+1. Changed.
Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v22-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/x-patch; name=v22-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 43b98d3f3360f6a4807938d013055caa8f5f43c6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 11 Jan 2021 08:17:53 +0530
Subject: [PATCH v22 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 28 ++-
src/backend/commands/createas.c | 84 ++++++-
src/backend/commands/explain.c | 44 ++++
src/backend/executor/execParallel.c | 322 ++++++++++++++++++++++++-
src/backend/executor/nodeGather.c | 129 +++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 16 ++
src/include/executor/execParallel.h | 42 +++-
src/include/nodes/execnodes.h | 3 +
11 files changed, 636 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..3741d824bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..750d15a572 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..a8050a2767 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
#include "commands/prepare.h"
#include "commands/tablecmds.h"
#include "commands/view.h"
+#include "executor/execParallel.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -51,18 +52,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -294,6 +283,11 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
}
else
{
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
/*
* Parse analysis was done already, but we still have to run the rule
* rewriter. We do not do AcquireRewriteLocks: we assume the query
@@ -338,6 +332,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -441,6 +448,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -461,6 +471,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -563,6 +602,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..e985ea6db3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execParallel.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -572,6 +573,27 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1791,6 +1813,28 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..ba4508c409 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,10 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ ParallelInsertCmdKind ins_cmd_type; /* parallel insertion command type */
+ Oid objectid; /* used by workers to open relation */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -135,10 +141,23 @@ static bool ExecParallelReInitializeDSM(PlanState *planstate,
ParallelContext *pcxt);
static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
SharedExecutorInstrumentation *instrumentation);
-
-/* Helper function that runs in the parallel worker. */
+static void ParallelInsCmdEstimate(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+
+/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static DestReceiver *ExecParallelGetInsReceiver(shm_toc *toc,
+ FixedParallelExecutorState *fpes);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -578,7 +597,9 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -712,6 +733,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for parallel insertions. */
+ if (parallel_ins_info)
+ ParallelInsCmdEstimate(pcxt, parallel_ins_cmd, parallel_ins_info);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +754,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion fixed info into DSA. */
+ SaveParallelInsCmdFixedInfo(pei, fpes, parallel_ins_cmd,
+ parallel_ins_info);
+ }
+ else
+ {
+ pei->processed = NULL;
+ fpes->ins_cmd_type = PARALLEL_INSERT_CMD_UNDEF;
+ fpes->objectid = InvalidOid;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +797,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion info into DSA. */
+ SaveParallelInsCmdInfo(pcxt, parallel_ins_cmd, parallel_ins_info);
+
+ /*
+ * Tuple queues are not required in case of parallel insertions by the
+ * workers, because Gather node will not receive any tuples.
+ */
+ pei->tqueue = NULL;
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1391,8 +1444,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ /* Set up DestReceiver. */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
+ else
+ receiver = ExecParallelGetReceiver(seg, toc);
+
+ /* Set up SharedExecutorInstrumentation, and QueryDesc. */
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1529,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
@@ -1479,3 +1544,246 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
FreeQueryDesc(queryDesc);
receiver->rDestroy(receiver);
}
+
+/*
+ * Estimate space required for sending parallel insert information to workers
+ * in commands such as CTAS.
+ */
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len = 0;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * Save fixed state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pei && fpes && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ pg_atomic_init_u64(&fpes->processed, 0);
+ fpes->ins_cmd_type = ins_cmd;
+ pei->processed = &fpes->processed;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ fpes->objectid = info->objectid;
+ }
+}
+
+/*
+ * Save variable state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len;
+ char *intoclause_space = NULL;
+
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclause_str, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+}
+
+/*
+ * Create a DestReceiver to write produced tuples to target relation in case of
+ * parallel insertions.
+ */
+static DestReceiver *
+ExecParallelGetInsReceiver(shm_toc *toc, FixedParallelExecutorState *fpes)
+{
+ ParallelInsertCmdKind ins_cmd;
+ DestReceiver *receiver;
+
+ Assert(fpes && toc &&
+ (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ ins_cmd = fpes->ins_cmd_type;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ char *intoclause_str = NULL;
+ IntoClause *intoclause = NULL;
+
+ intoclause_str = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclause_str);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+
+ return receiver;
+}
+
+/*
+ * Given a DestReceiver, return the command type if parallelism is allowed.
+ */
+ParallelInsertCmdKind
+GetParallelInsertCmdType(DestReceiver *dest)
+{
+ if (!dest)
+ return PARALLEL_INSERT_CMD_UNDEF;
+
+ if (dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel)
+ return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
+
+ return PARALLEL_INSERT_CMD_UNDEF;
+}
+
+/*
+ * Given a DestReceiver, allocate and fill parallel insert info structure
+ * corresponding to command type.
+ *
+ * Note that the memory allocated here for the info structure has to be freed
+ * up in caller.
+ */
+void *
+GetParallelInsertCmdInfo(DestReceiver *dest, ParallelInsertCmdKind ins_cmd)
+{
+ void *parallel_ins_info = NULL;
+
+ Assert(dest && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *)
+ palloc0(sizeof(ParallelInsertCTASInfo));
+ ctas_info->intoclause = ((DR_intorel *) dest)->into;
+ ctas_info->objectid = ((DR_intorel *) dest)->object_id;
+ parallel_ins_info = ctas_info;
+ }
+
+ return parallel_ins_info;
+}
+
+/*
+ * Check if parallel insertion is allowed in commands such as CTAS.
+ *
+ * Return true if allowed, otherwise false.
+ */
+bool
+IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
+{
+ Assert(ins_info && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ /*
+ * For CTAS, do not allow parallel inserts if target table is temporary. As
+ * the temporary tables are backend local, workers can not know about them.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+ IntoClause *into = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *) ins_info;
+ into = ctas_info->intoclause;
+
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!(into && IsA(into, IntoClause)))
+ return false;
+
+ /*
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Set the parallel insert state, if the upper node is Gather and it doesn't
+ * have any projections. The parallel insert state includes information such as
+ * a flag in the dest receiver and also a dest receiver reference in the Gather
+ * node so that the required information will be picked and sent to workers.
+ */
+void
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is into
+ * clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver, store
+ * a reference to it in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..8d9b7daea5 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,6 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsert(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +132,71 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsert(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for commands such as CREATE TABLE AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsert(GatherState *node)
+{
+ EState *estate = node->ps.state;
+
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target relation. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert all
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ estate->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ estate->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +223,17 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ ParallelInsertCmdKind parallel_ins_cmd;
+ bool perform_parallel_ins = false;
+
+ /*
+ * Get the parallel insert command type from the dest receiver which
+ * would have been set in SetParallelInsertState().
+ */
+ parallel_ins_cmd = GetParallelInsertCmdType(node->dest);
+
+ if (parallel_ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ perform_parallel_ins = true;
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +242,15 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ void *parallel_ins_info = NULL;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in commands such as CTAS.
+ */
+ if (perform_parallel_ins)
+ parallel_ins_info = GetParallelInsertCmdInfo(node->dest,
+ parallel_ins_cmd);
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +258,9 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ parallel_ins_cmd,
+ parallel_ins_info);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +278,22 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ /*
+ * Do not create tuple queue readers for commands with parallel
+ * insertion. Because the gather node will not receive any
+ * tuples, the workers will insert the tuples into the target
+ * relation.
+ */
+ if (!perform_parallel_ins)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -205,12 +302,24 @@ ExecGather(PlanState *pstate)
node->reader = NULL;
}
node->nextreader = 0;
+
+ /* Free up the parallel insert info, if allocated. */
+ if (parallel_ins_info)
+ pfree(parallel_ins_info);
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
- || (!gather->single_copy && parallel_leader_participation);
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !perform_parallel_ins) || (!gather->single_copy &&
+ parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for commands such as CTAS. */
+ if (perform_parallel_ins)
+ {
+ ExecParallelInsert(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..ea72473c8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ 0,
+ NULL);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f49a57b35e..4cd6f972ed 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ad5054d116..74022aab41 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..689f577c08 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -14,6 +14,7 @@
#define EXECPARALLEL_H
#include "access/parallel.h"
+#include "executor/execdesc.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
@@ -35,11 +36,42 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
+
+/*
+ * For each of the command added to ParallelInsertCmdKind, add a corresponding
+ * structure encompassing the information that's required to be shared across
+ * different functions. The way it works is as follows: in the caller, fill in
+ * the information into one of below structures based on the command kind, pass
+ * the command kind and a pointer to the filled in structure as a void pointer
+ * to required functions, say ExecInitParallelPlan. The called functions will
+ * use command kind to dereference the void pointer to corresponding structure.
+ *
+ * This way, the functions that are needed for parallel insertions can be
+ * generic, clean and extensible.
+ */
+typedef struct ParallelInsertCTASInfo
+{
+ IntoClause *intoclause;
+ Oid objectid;
+} ParallelInsertCTASInfo;
+
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
@@ -47,5 +79,11 @@ extern void ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei, Bitmapset *sendParam);
extern void ParallelQueryMain(dsm_segment *seg, shm_toc *toc);
-
+extern ParallelInsertCmdKind GetParallelInsertCmdType(DestReceiver *dest);
+extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
+ ParallelInsertCmdKind ins_cmd);
+extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
+ QueryDesc *queryDesc);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..297b3ff728 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v22-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v22-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From b0c0b9f2669fa48764b4f9718d50ca2ead7368bf Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 11 Jan 2021 08:19:36 +0530
Subject: [PATCH v22 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can ignore parallel tuple cost in case the workers can
insert the tuples in parallel. This is okay because the Gather
node will not actually receive any tuples. With the planner ignoring
the parallel tuple cost, there are chances that the planner may
choose the parallel plan which otherwise would have been costly
compared with the non-parallel plans and ignored.
---
src/backend/commands/createas.c | 13 +++++-
src/backend/commands/explain.c | 22 +++++++--
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execParallel.c | 66 ++++++++++++++++++++-------
src/backend/optimizer/path/costsize.c | 18 +++++++-
src/backend/optimizer/plan/planner.c | 40 ++++++++++++++++
src/include/commands/explain.h | 3 +-
src/include/executor/execParallel.h | 22 ++++++++-
src/include/nodes/parsenodes.h | 2 +
src/include/optimizer/planner.h | 10 ++++
10 files changed, 175 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index a8050a2767..53ca3010c6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -310,6 +310,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -342,7 +352,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc,
+ &query->parallelInsCmdTupleCostOpt);
}
/* run the plan to completion */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e985ea6db3..d7da07d4f6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -383,11 +383,25 @@ ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
if (es->buffers)
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -403,7 +417,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->parallelInsCmdTupleCostOpt);
}
}
@@ -513,7 +528,8 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -590,7 +606,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc, parallel_ins_tuple_cost_opts);
}
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..696d3343d4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ba4508c409..a26c9cdac8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1755,7 +1755,8 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
* node so that the required information will be picked and sent to workers.
*/
void
-SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts)
{
GatherState *gstate;
DestReceiver *dest;
@@ -1766,24 +1767,57 @@ SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
/*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is into
- * clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver, store
- * a reference to it in the Gather state so that it will be used in
- * ExecInitParallelPlan to pick the information.
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..d4a0fab37b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -393,7 +394,22 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_TUP_COST_IGNORED;
+ }
+ else
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..d1b7347de2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,36 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for commands in
+ * which parallel insertion is possible and we are generating an upper level
+ * Gather path.
+ */
+static void
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return;
+
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST;
+ }
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,7 +7588,16 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for commands in which parallel
+ * insertion is possible and we are generating an upper level Gather
+ * path.
+ */
+ ignore_parallel_tuple_cost(root);
generate_useful_gather_paths(root, rel, false);
+ }
/*
* Reassess which paths are the cheapest, now that we've potentially added
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index e94d9e49cf..1a75c3ced3 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 689f577c08..f76b5c2ffd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -49,6 +49,25 @@ typedef enum ParallelInsertCmdKind
PARALLEL_INSERT_CMD_CREATE_TABLE_AS
} ParallelInsertCmdKind;
+/*
+ * Information sent to planner to account for tuple cost calculations in
+ * cost_gather for parallel insertions in commands such as CTAS.
+ *
+ * We need to let the planner know that there will be no tuples received by
+ * Gather node if workers insert the tuples in parallel.
+ */
+typedef enum ParallelInsertCmdTupleCostOpt
+{
+ PARALLEL_INSERT_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+} ParallelInsertCmdTupleCostOpt;
+
/*
* For each of the command added to ParallelInsertCmdKind, add a corresponding
* structure encompassing the information that's required to be shared across
@@ -85,5 +104,6 @@ extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
void *ins_info);
extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dc2bb40926..70a78b169b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,8 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ /* Parallel insertion tuple cost options. */
+ uint8 parallelInsCmdTupleCostOpt;
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 9a15de5025..b71d21d334 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v22-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/x-patch; name=v22-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From f4eeed8cdefa14364749d72129158a2d217f2b80 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 10:43:47 +0530
Subject: [PATCH v22 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 568 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 215 +++++++
3 files changed, 809 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..fb111973e9 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,572 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: 1 Memory Usage: NkB
+ Worker 0: Batches: 1 Memory Usage: NkB
+ Worker 1: Batches: 1 Memory Usage: NkB
+ Worker 2: Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Merge Join (actual rows=N loops=N)
+ Merge Cond: (temp1.col1 = temp2.col2)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp1.col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Sort (actual rows=N loops=N)
+ Sort Key: temp2.col2
+ Sort Method: quicksort Memory: NkB
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+(17 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: 4096 Batches: 1 Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..d33e98def8 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,219 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur
+set enable_nestloop to off;
+set enable_mergejoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v22-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/x-patch; name=v22-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From ffbd77fefee51f6caecbbca6ae38c4ce7fef4886 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 7 Jan 2021 11:36:20 +0530
Subject: [PATCH v22 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/executor/execParallel.c | 154 ++--
src/backend/optimizer/path/allpaths.c | 31 +
src/backend/optimizer/plan/planner.c | 12 +-
src/include/executor/execParallel.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1089 insertions(+), 56 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a26c9cdac8..63fec33e80 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -151,6 +151,9 @@ static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
ParallelInsertCmdKind ins_cmd,
void *ins_info);
+static bool PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd,
+ bool *gather_exists);
/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
@@ -1748,6 +1751,84 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
return false;
}
+/*
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In this function we only care about Append and Gather nodes. This function
+ * returns true if at least one Gather node can allow parallel insertions by
+ * the workers. Otherwise returns false. It also sets gather_exists to true if
+ * at least one Gather node exists.
+ */
+static bool
+PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd, bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownParallelInsertState(dest, aps->appendplans[i],
+ ins_cmd, gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+ }
+ }
+
+ return parallel;
+}
+
/*
* Set the parallel insert state, if the upper node is Gather and it doesn't
* have any projections. The parallel insert state includes information such as
@@ -1758,66 +1839,35 @@ void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
uint8 *tuple_cost_opts)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
+ allow = PushDownParallelInsertState(queryDesc->dest, queryDesc->planstate,
+ ins_cmd, &gather_exists);
/*
- * Parallel insertions are possible only if the upper node is Gather.
+ * If parallel insertion is allowed or not allowed due to non-existence of
+ * Gather node, then return from here without going for below assertion
+ * check.
*/
- if (!IsA(gstate, GatherState))
+ if (allow || !gather_exists)
return;
/*
- * Parallelize inserts only when the upper Gather node has no projections.
+ * When parallel insertion is not allowed but Gather node exists, before
+ * returning ensure that we have not done wrong parallel tuple cost
+ * enforcement in the planner. Main reason for this assertion is to check
+ * if we enforced the planner to ignore the parallel tuple cost (with the
+ * intention of choosing parallel inserts) due to which the parallel plan
+ * may have been chosen, but we do not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's try to catch that here.
*/
- if (!gstate->ps.ps_ProjInfo)
- {
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is
- * into clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver,
- * store a reference to it in the Gather state so that it will be used
- * in ExecInitParallelPlan to pick the information.
- */
- gstate->dest = dest;
- }
- else
- {
- /*
- * Upper Gather node has projections, so parallel insertions are not
- * allowed.
- */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = false;
-
- gstate->dest = NULL;
-
- /*
- * Before returning, ensure that we have not done wrong parallel tuple
- * cost enforcement in the planner. Main reason for this assertion is
- * to check if we enforced the planner to ignore the parallel tuple
- * cost (with the intention of choosing parallel inserts) due to which
- * the parallel plan may have been chosen, but we do not allow the
- * parallel inserts now.
- *
- * If we have correctly ignored parallel tuple cost in the planner
- * while creating Gather path, then this assertion failure should not
- * occur. In case it occurs, that means the planner may have chosen
- * this parallel plan because of our wrong enforcement. So let's try to
- * catch that here.
- */
- Assert(tuple_cost_opts && !(*tuple_cost_opts &
- PARALLEL_INSERT_TUP_COST_IGNORED));
- }
+ Assert(!allow && gather_exists && tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..96b5ce81c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "executor/execParallel.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d1b7347de2..423619735b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7351,9 +7351,15 @@ can_partial_agg(PlannerInfo *root)
static void
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->parallelInsCmdTupleCostOpt &
- PARALLEL_INSERT_SELECT_QUERY))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_SELECT_QUERY;
+ }
+
+ if (root->parse->parallelInsCmdTupleCostOpt & PARALLEL_INSERT_SELECT_QUERY)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f76b5c2ffd..41f116bbf5 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -65,7 +65,9 @@ typedef enum ParallelInsertCmdTupleCostOpt
*/
PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
/* Turn on this after the cost is ignored. */
- PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND = 1 << 3
} ParallelInsertCmdTupleCostOpt;
/*
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index fb111973e9..33358a86c9 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -640,6 +640,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index d33e98def8..8e3dcb7280 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -250,6 +250,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
On 06-01-2021 09:32, Bharath Rupireddy wrote:
On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming <luc@swarm64.com> wrote:
wrt v18-0002....patch:
It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costswhat i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.I may be wrong. Thoughts?
So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this
would make the parallel inserts conceptually invalid.Right.
Right now we're trying to judge what might be added on-top that could
change the rows by inspecting all parts of the root object that would
cause anything to be added, and add a little statemachine to track the
state of that knowledge. To me this has the downside that the list in
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
make sure it stays up-to-date, which could result in regressions if not
tracked carefully.Right. Any new clause that will be added which generates an upper path
in grouping_planner after apply_scanjoin_target_to_paths also needs to
be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
ignore the parallel tuple cost because of which the parallel plan may
be chosen and we go for parallel inserts only when the top node is
Gather. I don't think any new clause that will be added generates a
new upper Gather node in grouping_planner after
apply_scanjoin_target_to_paths.Personally I would therefore go for a design which is safe in the sense
that regressions are not as easily introduced. IMHO that could be done
by inspecting the planned query afterwards, and then judging whether or
not the parallel inserts are actually the right thing to do.The 0001 patch does that. It doesn't have any influence on the planner
for parallel tuple cost calculation, it just looks at the generated
plan and decides on parallel inserts. Having said that, we might miss
parallel plans even though we know that there will not be tuples
transferred from workers to Gather. So, 0002 patch adds the code for
influencing the planner for parallel tuple cost.
Ok. Thanks for the explanation and sorry for the confusion.
Another way to create more safety against regressions would be to add an
assert upon execution of the query that if we do parallel inserts that
only a subset of allowed nodes exists above the gather node.Yes, we already do this. Please have a look at
SetParallelInsertState() in the 0002 patch. The idea is that in any
case, if the planner ignored the tuple cost, but we later not allow
parallel inserts either due to the upper node is not Gather or Gather
with projections. The assertion fails. So, in case any new parent path
generating clause is added (apart from the ones that are there in
HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
this Assert will catch it. Currently, I couldn't find any assertion
failures in my debug build with make check and make check world.
Ok. Seems I missed that assert when reviewing.
+ else + { + /* + * Upper Gather node has projections, so parallel insertions are not + * allowed. + */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = false; + + gstate->dest = NULL; + + /* + * Before returning, ensure that we have not done wrong parallel tuple + * cost enforcement in the planner. Main reason for this assertion is + * to check if we enforced the planner to ignore the parallel tuple + * cost (with the intention of choosing parallel inserts) due to which + * the parallel plan may have been chosen, but we do not allow the + * parallel inserts now. + * + * If we have correctly ignored parallel tuple cost in the planner + * while creating Gather path, then this assertion failure should not + * occur. In case it occurs, that means the planner may have chosen + * this parallel plan because of our wrong enforcement. So let's try to + * catch that here. + */ + Assert(tuple_cost_opts && !(*tuple_cost_opts & + PARALLEL_INSERT_TUP_COST_IGNORED)); + }With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Kind regards,
Luc
On Mon, Jan 11, 2021 at 8:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.
Seems like v22 patch was failing in cfbot for one of the unstable test
cases. Attaching v23 patch set with modification in 0003 and 0004
patches. No changes to 0001 and 0002 patches. Hopefully cfbot will be
happy with v23.
Please consider v23 for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v23-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchapplication/octet-stream; name=v23-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patchDownload
From 43b98d3f3360f6a4807938d013055caa8f5f43c6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 11 Jan 2021 08:17:53 +0530
Subject: [PATCH v23 1/4] Parallel Inserts in CREATE TABLE AS
Allow the leader and each worker insert the tuples in parallel
if the SELECT part of the CTAS is parallelizable.
The design:
The main idea is to push the CTAS dest receiver down to Gather node
and from there the required information will be shared to workers
so that they can perform parallel insertions. Leader will also
participate in insertions. After the planning, check if the upper
plan node is Gather in createas.c and mark a parallelism flag in
the CTAS dest receiver and push it down to Gather node. Each worker
can create its own CTAS dest receiver with the information passed
from the leader. Leader inserts its share of tuples if instructed
to do, and so are workers. Each worker writes atomically its number
of inserted tuples into a shared memory variable, the leader combines
this with its own number of inserted tuples and shares to the client.
---
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 28 ++-
src/backend/commands/createas.c | 84 ++++++-
src/backend/commands/explain.c | 44 ++++
src/backend/executor/execParallel.c | 322 ++++++++++++++++++++++++-
src/backend/executor/nodeGather.c | 129 +++++++++-
src/backend/executor/nodeGatherMerge.c | 4 +-
src/include/access/xact.h | 1 +
src/include/commands/createas.h | 16 ++
src/include/executor/execParallel.h | 42 +++-
src/include/nodes/execnodes.h | 3 +
11 files changed, 636 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..3741d824bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..750d15a572 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,17 +764,35 @@ GetCurrentCommandId(bool used)
if (used)
{
/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader. We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed if
+ * currentCommandIdUsed was already true at the start of the parallel
+ * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+ * setting currentCommandIdUsed because we have no provision for
+ * communicating this back to the leader. Once currentCommandIdUsed is
+ * set, the commandId used by leader and workers can't be changed,
+ * because CommandCounterIncrement() then prevents any attempted
+ * increment of the current commandId.
*/
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
currentCommandIdUsed = true;
}
return currentCommandId;
}
+/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used. This
+ * must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed && currentCommandId != InvalidCommandId);
+
+ currentCommandIdUsed = true;
+}
+
/*
* SetParallelStartTimestamps
*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..a8050a2767 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
#include "commands/prepare.h"
#include "commands/tablecmds.h"
#include "commands/view.h"
+#include "executor/execParallel.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -51,18 +52,6 @@
#include "utils/rls.h"
#include "utils/snapmgr.h"
-typedef struct
-{
- DestReceiver pub; /* publicly-known function pointers */
- IntoClause *into; /* target relation specification */
- /* These fields are filled by intorel_startup: */
- Relation rel; /* relation to write to */
- ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
- CommandId output_cid; /* cmin to insert in output tuples */
- int ti_options; /* table_tuple_insert performance options */
- BulkInsertState bistate; /* bulk insert state */
-} DR_intorel;
-
/* utility functions for CTAS definition creation */
static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
@@ -294,6 +283,11 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
}
else
{
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
/*
* Parse analysis was done already, but we still have to run the rule
* rewriter. We do not do AcquireRewriteLocks: we assume the query
@@ -338,6 +332,19 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
@@ -441,6 +448,9 @@ CreateIntoRelDestReceiver(IntoClause *intoClause)
self->pub.rDestroy = intorel_destroy;
self->pub.mydest = DestIntoRel;
self->into = intoClause;
+ self->is_parallel = false;
+ self->is_parallel_worker = false;
+ self->object_id = InvalidOid;
/* other private fields will be set during intorel_startup */
return (DestReceiver *) self;
@@ -461,6 +471,35 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ /*
+ * All the necessary work such as table creation, sanity checks etc. would
+ * have been done by the leader. So, parallel workers just need to open the
+ * table, allocate bulk insert state, mark the command id as used, store it
+ * in the dest receiver and return.
+ */
+ if (myState->is_parallel_worker)
+ {
+ /* In the worker */
+ intoRelationDesc = table_open(myState->object_id, AccessExclusiveLock);
+ myState->rel = intoRelationDesc;
+ myState->reladdr = InvalidObjectAddress;
+ myState->ti_options = 0;
+ myState->bistate = GetBulkInsertState();
+
+ /*
+ * Right after the table is created in the leader, the command id is
+ * incremented (in create_ctas_internal()). The new command id is
+ * marked as used in intorel_startup(), then the parallel mode is
+ * entered. The command id and transaction id are serialized into
+ * parallel DSM, they are then available to all parallel workers. All
+ * the workers need to mark the command id as used before insertion.
+ */
+ SetCurrentCommandIdUsedForWorker();
+ myState->output_cid = GetCurrentCommandId(false);
+
+ return;
+ }
+
Assert(into != NULL); /* else somebody forgot to set it */
/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
@@ -563,6 +602,27 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
else
myState->bistate = NULL;
+ /* If parallel inserts are to be allowed, set few extra information. */
+ if (myState->is_parallel)
+ {
+ myState->object_id = intoRelationAddr.objectId;
+
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
+
+ /*
+ * rd_createSubid is marked invalid, otherwise, the table is not
+ * allowed to be extended by the workers.
+ */
+ myState->rel->rd_createSubid = InvalidSubTransactionId;
+ }
+
/*
* Valid smgr_targblock implies something already wrote to the relation.
* This may be harmless, but this function hasn't planned for it.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..e985ea6db3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execParallel.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -572,6 +573,27 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, eflags);
+ if (into)
+ {
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
+
+ /* See if we can perform parallel insertions. */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ {
+ /*
+ * If the SELECT part of the CTAS is parallelizable, then set the
+ * parallel insert state. We need plan state to be initialized by
+ * the executor to decide whether to allow parallel inserts or not.
+ */
+ SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ queryDesc);
+ }
+ }
+
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -1791,6 +1813,28 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
ExplainPropertyBool("Single Copy", gather->single_copy, es);
+
+ /*
+ * Show the create table information under Gather node in case
+ * parallel workers have inserted the rows.
+ */
+ if (IsA(planstate, GatherState))
+ {
+ GatherState *gstate = (GatherState *) planstate;
+
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ es->indent--;
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "-> ");
+ appendStringInfoString(es->str, "Create ");
+ appendStringInfo(es->str, "%s\n",
+ ((DR_intorel *) gstate->dest)->into->rel->relname);
+ ExplainIndentText(es);
+ es->indent++;
+ }
+ }
}
break;
case T_GatherMerge:
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index c95d5170e4..ba4508c409 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "commands/createas.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -65,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_INTO_CLAUSE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -77,6 +79,10 @@ typedef struct FixedParallelExecutorState
dsa_pointer param_exec;
int eflags;
int jit_flags;
+ ParallelInsertCmdKind ins_cmd_type; /* parallel insertion command type */
+ Oid objectid; /* used by workers to open relation */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;
} FixedParallelExecutorState;
/*
@@ -135,10 +141,23 @@ static bool ExecParallelReInitializeDSM(PlanState *planstate,
ParallelContext *pcxt);
static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
SharedExecutorInstrumentation *instrumentation);
-
-/* Helper function that runs in the parallel worker. */
+static void ParallelInsCmdEstimate(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+
+/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static DestReceiver *ExecParallelGetInsReceiver(shm_toc *toc,
+ FixedParallelExecutorState *fpes);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -578,7 +597,9 @@ ExecParallelSetupTupleQueues(ParallelContext *pcxt, bool reinitialize)
ParallelExecutorInfo *
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
{
ParallelExecutorInfo *pei;
ParallelContext *pcxt;
@@ -712,6 +733,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, dsa_minsize);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for parallel insertions. */
+ if (parallel_ins_info)
+ ParallelInsCmdEstimate(pcxt, parallel_ins_cmd, parallel_ins_info);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -729,6 +754,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
fpes->param_exec = InvalidDsaPointer;
fpes->eflags = estate->es_top_eflags;
fpes->jit_flags = estate->es_jit_flags;
+
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion fixed info into DSA. */
+ SaveParallelInsCmdFixedInfo(pei, fpes, parallel_ins_cmd,
+ parallel_ins_info);
+ }
+ else
+ {
+ pei->processed = NULL;
+ fpes->ins_cmd_type = PARALLEL_INSERT_CMD_UNDEF;
+ fpes->objectid = InvalidOid;
+ }
+
shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECUTOR_FIXED, fpes);
/* Store query string */
@@ -758,8 +797,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
- /* Set up the tuple queues that the workers will write into. */
- pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ if (parallel_ins_info)
+ {
+ /* Save parallel insertion info into DSA. */
+ SaveParallelInsCmdInfo(pcxt, parallel_ins_cmd, parallel_ins_info);
+
+ /*
+ * Tuple queues are not required in case of parallel insertions by the
+ * workers, because Gather node will not receive any tuples.
+ */
+ pei->tqueue = NULL;
+ }
+ else
+ {
+ /* Set up the tuple queues that the workers will write into. */
+ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+ }
/* We don't need the TupleQueueReaders yet, though. */
pei->reader = NULL;
@@ -1391,8 +1444,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Get fixed-size state. */
fpes = shm_toc_lookup(toc, PARALLEL_KEY_EXECUTOR_FIXED, false);
- /* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
- receiver = ExecParallelGetReceiver(seg, toc);
+ /* Set up DestReceiver. */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ receiver = ExecParallelGetInsReceiver(toc, fpes);
+ else
+ receiver = ExecParallelGetReceiver(seg, toc);
+
+ /* Set up SharedExecutorInstrumentation, and QueryDesc. */
instrumentation = shm_toc_lookup(toc, PARALLEL_KEY_INSTRUMENTATION, true);
if (instrumentation != NULL)
instrument_options = instrumentation->instrument_options;
@@ -1471,6 +1529,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
queryDesc->estate->es_jit->instr;
}
+ /*
+ * Write out the number of tuples this worker has inserted. Leader will use
+ * it to inform the end client.
+ */
+ if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
+ pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
+
/* Must do this after capturing instrumentation. */
ExecutorEnd(queryDesc);
@@ -1479,3 +1544,246 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
FreeQueryDesc(queryDesc);
receiver->rDestroy(receiver);
}
+
+/*
+ * Estimate space required for sending parallel insert information to workers
+ * in commands such as CTAS.
+ */
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len = 0;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * Save fixed state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
+ FixedParallelExecutorState *fpes,
+ ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pei && fpes && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ pg_atomic_init_u64(&fpes->processed, 0);
+ fpes->ins_cmd_type = ins_cmd;
+ pei->processed = &fpes->processed;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+
+ info = (ParallelInsertCTASInfo *) ins_info;
+ fpes->objectid = info->objectid;
+ }
+}
+
+/*
+ * Save variable state information required by workers for parallel inserts in
+ * commands such as CTAS.
+ */
+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
+{
+ Assert(pcxt && ins_info &&
+ (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *info = NULL;
+ char *intoclause_str = NULL;
+ int intoclause_len;
+ char *intoclause_space = NULL;
+
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
+
+ memcpy(intoclause_space, intoclause_str, intoclause_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
+ }
+}
+
+/*
+ * Create a DestReceiver to write produced tuples to target relation in case of
+ * parallel insertions.
+ */
+static DestReceiver *
+ExecParallelGetInsReceiver(shm_toc *toc, FixedParallelExecutorState *fpes)
+{
+ ParallelInsertCmdKind ins_cmd;
+ DestReceiver *receiver;
+
+ Assert(fpes && toc &&
+ (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ ins_cmd = fpes->ins_cmd_type;
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ char *intoclause_str = NULL;
+ IntoClause *intoclause = NULL;
+
+ intoclause_str = shm_toc_lookup(toc, PARALLEL_KEY_INTO_CLAUSE, true);
+
+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclause_str);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
+ }
+
+ return receiver;
+}
+
+/*
+ * Given a DestReceiver, return the command type if parallelism is allowed.
+ */
+ParallelInsertCmdKind
+GetParallelInsertCmdType(DestReceiver *dest)
+{
+ if (!dest)
+ return PARALLEL_INSERT_CMD_UNDEF;
+
+ if (dest->mydest == DestIntoRel &&
+ ((DR_intorel *) dest)->is_parallel)
+ return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
+
+ return PARALLEL_INSERT_CMD_UNDEF;
+}
+
+/*
+ * Given a DestReceiver, allocate and fill parallel insert info structure
+ * corresponding to command type.
+ *
+ * Note that the memory allocated here for the info structure has to be freed
+ * up in caller.
+ */
+void *
+GetParallelInsertCmdInfo(DestReceiver *dest, ParallelInsertCmdKind ins_cmd)
+{
+ void *parallel_ins_info = NULL;
+
+ Assert(dest && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *)
+ palloc0(sizeof(ParallelInsertCTASInfo));
+ ctas_info->intoclause = ((DR_intorel *) dest)->into;
+ ctas_info->objectid = ((DR_intorel *) dest)->object_id;
+ parallel_ins_info = ctas_info;
+ }
+
+ return parallel_ins_info;
+}
+
+/*
+ * Check if parallel insertion is allowed in commands such as CTAS.
+ *
+ * Return true if allowed, otherwise false.
+ */
+bool
+IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
+{
+ Assert(ins_info && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ /*
+ * For CTAS, do not allow parallel inserts if target table is temporary. As
+ * the temporary tables are backend local, workers can not know about them.
+ *
+ * Return false either if the into clause is NULL or if the table is
+ * temporary, otherwise true.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {
+ ParallelInsertCTASInfo *ctas_info = NULL;
+ IntoClause *into = NULL;
+
+ ctas_info = (ParallelInsertCTASInfo *) ins_info;
+ into = ctas_info->intoclause;
+
+ /* Below check may hit in case this function is called from explain.c. */
+ if (!(into && IsA(into, IntoClause)))
+ return false;
+
+ /*
+ * Currently, CTAS supports creation of normal(logged), temporary and
+ * unlogged tables. It does not support foreign or partition table
+ * creation. Hence the check for temporary table is enough here.
+ */
+ if (into->rel && into->rel->relpersistence == RELPERSISTENCE_TEMP)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Set the parallel insert state, if the upper node is Gather and it doesn't
+ * have any projections. The parallel insert state includes information such as
+ * a flag in the dest receiver and also a dest receiver reference in the Gather
+ * node so that the required information will be picked and sent to workers.
+ */
+void
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+{
+ GatherState *gstate;
+ DestReceiver *dest;
+
+ Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+ gstate = (GatherState *) queryDesc->planstate;
+ dest = queryDesc->dest;
+
+ /*
+ * Parallel insertions are not possible either if the upper node is not
+ * Gather or it's a Gather but it have some projections to perform.
+ */
+ if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ return;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is into
+ * clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver, store
+ * a reference to it in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+}
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 9e1dc464cb..8d9b7daea5 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -48,6 +48,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate);
static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+static void ExecParallelInsert(GatherState *node);
/* ----------------------------------------------------------------
@@ -131,6 +132,71 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
return gatherstate;
}
+/* ----------------------------------------------------------------
+ * ExecParallelInsert(node)
+ *
+ * Facilitates parallel inserts by parallel workers and/or
+ * leader for commands such as CREATE TABLE AS.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecParallelInsert(GatherState *node)
+{
+ EState *estate = node->ps.state;
+
+ /*
+ * By now, parallel workers if launched any, would have started their work
+ * i.e. insertion to target relation. In case the leader is also chosen to
+ * participate, then finish its share before going to wait for the parallel
+ * workers to finish.
+ *
+ * In case if no workers were launched, allow the leader to insert all
+ * tuples.
+ */
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = node->pei ? node->pei->area : NULL;
+
+ outerTupleSlot = ExecProcNode(node->ps.lefttree);
+
+ estate->es_query_dsa = NULL;
+
+ if(TupIsNull(outerTupleSlot))
+ break;
+
+ (void) node->dest->receiveSlot(outerTupleSlot, node->dest);
+
+ estate->es_processed++;
+ }
+
+ node->need_to_scan_locally = false;
+ }
+
+ if (node->nworkers_launched > 0)
+ {
+ /*
+ * We wait here for the parallel workers to finish their work and
+ * accumulate the tuples they inserted and also their buffer/WAL usage.
+ * We do not destroy the parallel context here, it will be done in
+ * ExecShutdownGather at the end of the plan. Note that the
+ * ExecShutdownGatherWorkers call from ExecShutdownGather will be a
+ * no-op.
+ */
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ estate->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecGather(node)
*
@@ -157,6 +223,17 @@ ExecGather(PlanState *pstate)
{
EState *estate = node->ps.state;
Gather *gather = (Gather *) node->ps.plan;
+ ParallelInsertCmdKind parallel_ins_cmd;
+ bool perform_parallel_ins = false;
+
+ /*
+ * Get the parallel insert command type from the dest receiver which
+ * would have been set in SetParallelInsertState().
+ */
+ parallel_ins_cmd = GetParallelInsertCmdType(node->dest);
+
+ if (parallel_ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ perform_parallel_ins = true;
/*
* Sometimes we might have to run without parallelism; but if parallel
@@ -165,6 +242,15 @@ ExecGather(PlanState *pstate)
if (gather->num_workers > 0 && estate->es_use_parallel_mode)
{
ParallelContext *pcxt;
+ void *parallel_ins_info = NULL;
+
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in commands such as CTAS.
+ */
+ if (perform_parallel_ins)
+ parallel_ins_info = GetParallelInsertCmdInfo(node->dest,
+ parallel_ins_cmd);
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
@@ -172,7 +258,9 @@ ExecGather(PlanState *pstate)
estate,
gather->initParam,
gather->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ parallel_ins_cmd,
+ parallel_ins_info);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
@@ -190,13 +278,22 @@ ExecGather(PlanState *pstate)
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
{
- ExecParallelCreateReaders(node->pei);
- /* Make a working array showing the active readers */
- node->nreaders = pcxt->nworkers_launched;
- node->reader = (TupleQueueReader **)
- palloc(node->nreaders * sizeof(TupleQueueReader *));
- memcpy(node->reader, node->pei->reader,
- node->nreaders * sizeof(TupleQueueReader *));
+ /*
+ * Do not create tuple queue readers for commands with parallel
+ * insertion. Because the gather node will not receive any
+ * tuples, the workers will insert the tuples into the target
+ * relation.
+ */
+ if (!perform_parallel_ins)
+ {
+ ExecParallelCreateReaders(node->pei);
+ /* Make a working array showing the active readers */
+ node->nreaders = pcxt->nworkers_launched;
+ node->reader = (TupleQueueReader **)
+ palloc(node->nreaders * sizeof(TupleQueueReader *));
+ memcpy(node->reader, node->pei->reader,
+ node->nreaders * sizeof(TupleQueueReader *));
+ }
}
else
{
@@ -205,12 +302,24 @@ ExecGather(PlanState *pstate)
node->reader = NULL;
}
node->nextreader = 0;
+
+ /* Free up the parallel insert info, if allocated. */
+ if (parallel_ins_info)
+ pfree(parallel_ins_info);
}
/* Run plan locally if no workers or enabled and not single-copy. */
- node->need_to_scan_locally = (node->nreaders == 0)
- || (!gather->single_copy && parallel_leader_participation);
+ node->need_to_scan_locally = (node->nreaders == 0 &&
+ !perform_parallel_ins) || (!gather->single_copy &&
+ parallel_leader_participation);
node->initialized = true;
+
+ /* Perform parallel inserts for commands such as CTAS. */
+ if (perform_parallel_ins)
+ {
+ ExecParallelInsert(node);
+ return NULL;
+ }
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index aa5743cebf..ea72473c8e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -216,7 +216,9 @@ ExecGatherMerge(PlanState *pstate)
estate,
gm->initParam,
gm->num_workers,
- node->tuples_needed);
+ node->tuples_needed,
+ 0,
+ NULL);
else
ExecParallelReinitialize(node->ps.lefttree,
node->pei,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f49a57b35e..4cd6f972ed 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -389,6 +389,7 @@ extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
extern TimestampTz GetCurrentTransactionStartTimestamp(void);
extern TimestampTz GetCurrentStatementStartTimestamp(void);
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index ad5054d116..74022aab41 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -14,12 +14,28 @@
#ifndef CREATEAS_H
#define CREATEAS_H
+#include "access/heapam.h"
#include "catalog/objectaddress.h"
#include "nodes/params.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+typedef struct
+{
+ DestReceiver pub; /* publicly-known function pointers */
+ IntoClause *into; /* target relation specification */
+ /* These fields are filled by intorel_startup: */
+ Relation rel; /* relation to write to */
+ ObjectAddress reladdr; /* address of rel, for ExecCreateTableAs */
+ CommandId output_cid; /* cmin to insert in output tuples */
+ int ti_options; /* table_tuple_insert performance options */
+ BulkInsertState bistate; /* bulk insert state */
+ bool is_parallel; /* is parallelism to be considered? */
+ bool is_parallel_worker; /* true for parallel worker */
+ /* Used by parallel workers for opening the table created in the leader. */
+ Oid object_id;
+} DR_intorel;
extern ObjectAddress ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 3888175a2f..689f577c08 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -14,6 +14,7 @@
#define EXECPARALLEL_H
#include "access/parallel.h"
+#include "executor/execdesc.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
@@ -35,11 +36,42 @@ typedef struct ParallelExecutorInfo
/* These two arrays have pcxt->nworkers_launched entries: */
shm_mq_handle **tqueue; /* tuple queues for worker output */
struct TupleQueueReader **reader; /* tuple reader/writer support */
+ /* Number of tuples inserted by all workers. */
+ volatile pg_atomic_uint64 *processed;
} ParallelExecutorInfo;
+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;
+
+/*
+ * For each of the command added to ParallelInsertCmdKind, add a corresponding
+ * structure encompassing the information that's required to be shared across
+ * different functions. The way it works is as follows: in the caller, fill in
+ * the information into one of below structures based on the command kind, pass
+ * the command kind and a pointer to the filled in structure as a void pointer
+ * to required functions, say ExecInitParallelPlan. The called functions will
+ * use command kind to dereference the void pointer to corresponding structure.
+ *
+ * This way, the functions that are needed for parallel insertions can be
+ * generic, clean and extensible.
+ */
+typedef struct ParallelInsertCTASInfo
+{
+ IntoClause *intoclause;
+ Oid objectid;
+} ParallelInsertCTASInfo;
+
extern ParallelExecutorInfo *ExecInitParallelPlan(PlanState *planstate,
EState *estate, Bitmapset *sendParam, int nworkers,
- int64 tuples_needed);
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info);
extern void ExecParallelCreateReaders(ParallelExecutorInfo *pei);
extern void ExecParallelFinish(ParallelExecutorInfo *pei);
extern void ExecParallelCleanup(ParallelExecutorInfo *pei);
@@ -47,5 +79,11 @@ extern void ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei, Bitmapset *sendParam);
extern void ParallelQueryMain(dsm_segment *seg, shm_toc *toc);
-
+extern ParallelInsertCmdKind GetParallelInsertCmdType(DestReceiver *dest);
+extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
+ ParallelInsertCmdKind ins_cmd);
+extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
+ void *ins_info);
+extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
+ QueryDesc *queryDesc);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..297b3ff728 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -23,6 +23,7 @@
#include "nodes/tidbitmap.h"
#include "partitioning/partdefs.h"
#include "storage/condition_variable.h"
+#include "tcop/dest.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
#include "utils/reltrigger.h"
@@ -2326,6 +2327,8 @@ typedef struct GatherState
int nreaders; /* number of still-active workers */
int nextreader; /* next one to try to read from */
struct TupleQueueReader **reader; /* array with nreaders active entries */
+ /* Dest receiver is stored when parallel inserts is allowed in CTAS. */
+ DestReceiver *dest;
} GatherState;
/* ----------------
--
2.25.1
v23-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v23-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patchDownload
From b0c0b9f2669fa48764b4f9718d50ca2ead7368bf Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 11 Jan 2021 08:19:36 +0530
Subject: [PATCH v23 2/4] Tuple Cost Adjustment for Parallel Inserts in CTAS
Let the planner know that the SELECT is from CTAS in createas.c
so that it can ignore parallel tuple cost in case the workers can
insert the tuples in parallel. This is okay because the Gather
node will not actually receive any tuples. With the planner ignoring
the parallel tuple cost, there are chances that the planner may
choose the parallel plan which otherwise would have been costly
compared with the non-parallel plans and ignored.
---
src/backend/commands/createas.c | 13 +++++-
src/backend/commands/explain.c | 22 +++++++--
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execParallel.c | 66 ++++++++++++++++++++-------
src/backend/optimizer/path/costsize.c | 18 +++++++-
src/backend/optimizer/plan/planner.c | 40 ++++++++++++++++
src/include/commands/explain.h | 3 +-
src/include/executor/execParallel.h | 22 ++++++++-
src/include/nodes/parsenodes.h | 2 +
src/include/optimizer/planner.h | 10 ++++
10 files changed, 175 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index a8050a2767..53ca3010c6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -310,6 +310,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
@@ -342,7 +352,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc,
+ &query->parallelInsCmdTupleCostOpt);
}
/* run the plan to completion */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e985ea6db3..d7da07d4f6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -383,11 +383,25 @@ ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ ParallelInsertCTASInfo parallel_ins_info;
+
+ parallel_ins_info.intoclause = into;
+ parallel_ins_info.objectid = InvalidOid;
if (es->buffers)
bufusage_start = pgBufferUsage;
INSTR_TIME_SET_CURRENT(planstart);
+ /*
+ * Turn on a flag to indicate planner so that it can ignore parallel
+ * tuple cost while generating Gather path.
+ */
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
+ query->parallelInsCmdTupleCostOpt |= PARALLEL_INSERT_SELECT_QUERY;
+ else
+ query->parallelInsCmdTupleCostOpt = 0;
+
/* plan the query */
plan = pg_plan_query(query, queryString, cursorOptions, params);
@@ -403,7 +417,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ &query->parallelInsCmdTupleCostOpt);
}
}
@@ -513,7 +528,8 @@ void
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage)
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -590,7 +606,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* the executor to decide whether to allow parallel inserts or not.
*/
SetParallelInsertState(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
- queryDesc);
+ queryDesc, parallel_ins_tuple_cost_opts);
}
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..696d3343d4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -672,7 +672,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ &planduration, (es->buffers ? &bufusage : NULL),
+ NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ba4508c409..a26c9cdac8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1755,7 +1755,8 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
* node so that the required information will be picked and sent to workers.
*/
void
-SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
+SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts)
{
GatherState *gstate;
DestReceiver *dest;
@@ -1766,24 +1767,57 @@ SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
dest = queryDesc->dest;
/*
- * Parallel insertions are not possible either if the upper node is not
- * Gather or it's a Gather but it have some projections to perform.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
- if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
+ if (!IsA(gstate, GatherState))
return;
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
/*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is into
- * clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver, store
- * a reference to it in the Gather state so that it will be used in
- * ExecInitParallelPlan to pick the information.
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+
+ /*
+ * Before returning, ensure that we have not done wrong parallel tuple
+ * cost enforcement in the planner. Main reason for this assertion is
+ * to check if we enforced the planner to ignore the parallel tuple
+ * cost (with the intention of choosing parallel inserts) due to which
+ * the parallel plan may have been chosen, but we do not allow the
+ * parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner
+ * while creating Gather path, then this assertion failure should not
+ * occur. In case it occurs, that means the planner may have chosen
+ * this parallel plan because of our wrong enforcement. So let's try to
+ * catch that here.
+ */
+ Assert(tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
+ }
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..d4a0fab37b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
#include "access/tsmapi.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeHash.h"
@@ -393,7 +394,22 @@ cost_gather(GatherPath *path, PlannerInfo *root,
/* Parallel setup and communication cost. */
startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))
+ {
+ /* We are ignoring the parallel tuple cost, so mark it. */
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_TUP_COST_IGNORED;
+ }
+ else
+ run_cost += parallel_tuple_cost * path->path.rows;
path->path.startup_cost = startup_cost;
path->path.total_cost = (startup_cost + run_cost);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4e6497ff32..d1b7347de2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -28,6 +28,7 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -7338,6 +7339,36 @@ can_partial_agg(PlannerInfo *root)
return true;
}
+/*
+ * ignore_parallel_tuple_cost
+ *
+ * Gather node will not receive any tuples from the workers in case each worker
+ * inserts them in parallel. So, we turn on a flag to ignore parallel tuple
+ * cost by the Gather path in cost_gather if the SELECT is for commands in
+ * which parallel insertion is possible and we are generating an upper level
+ * Gather path.
+ */
+static void
+ignore_parallel_tuple_cost(PlannerInfo *root)
+{
+ if (root->query_level == 1 &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY))
+ {
+ /*
+ * In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
+ * path will be generated for the upper Gather path(in
+ * grouping_planner), in which case we can not let parallel inserts
+ * happen. So we do not turn on ignore tuple cost flag.
+ */
+ if (HAS_PARENT_PATH_GENERATING_CLAUSE(root))
+ return;
+
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST;
+ }
+}
+
/*
* apply_scanjoin_target_to_paths
*
@@ -7557,7 +7588,16 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
+ {
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for commands in which parallel
+ * insertion is possible and we are generating an upper level Gather
+ * path.
+ */
+ ignore_parallel_tuple_cost(root);
generate_useful_gather_paths(root, rel, false);
+ }
/*
* Reassess which paths are the cheapest, now that we've potentially added
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index e94d9e49cf..1a75c3ced3 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -91,7 +91,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
- const BufferUsage *bufusage);
+ const BufferUsage *bufusage,
+ uint8 *parallel_ins_tuple_cost_opts);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 689f577c08..f76b5c2ffd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -49,6 +49,25 @@ typedef enum ParallelInsertCmdKind
PARALLEL_INSERT_CMD_CREATE_TABLE_AS
} ParallelInsertCmdKind;
+/*
+ * Information sent to planner to account for tuple cost calculations in
+ * cost_gather for parallel insertions in commands such as CTAS.
+ *
+ * We need to let the planner know that there will be no tuples received by
+ * Gather node if workers insert the tuples in parallel.
+ */
+typedef enum ParallelInsertCmdTupleCostOpt
+{
+ PARALLEL_INSERT_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+} ParallelInsertCmdTupleCostOpt;
+
/*
* For each of the command added to ParallelInsertCmdKind, add a corresponding
* structure encompassing the information that's required to be shared across
@@ -85,5 +104,6 @@ extern void *GetParallelInsertCmdInfo(DestReceiver *dest,
extern bool IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd,
void *ins_info);
extern void SetParallelInsertState(ParallelInsertCmdKind ins_cmd,
- QueryDesc *queryDesc);
+ QueryDesc *queryDesc,
+ uint8 *tuple_cost_opts);
#endif /* EXECPARALLEL_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dc2bb40926..70a78b169b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -180,6 +180,8 @@ typedef struct Query
*/
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
+ /* Parallel insertion tuple cost options. */
+ uint8 parallelInsCmdTupleCostOpt;
} Query;
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 9a15de5025..b71d21d334 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -21,6 +21,16 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#define HAS_PARENT_PATH_GENERATING_CLAUSE(root) \
+ (root->parse->rowMarks || \
+ limit_needed(root->parse) || \
+ root->parse->sortClause || \
+ root->parse->distinctClause || \
+ root->parse->hasWindowFuncs || \
+ root->parse->groupClause || \
+ root->parse->groupingSets || \
+ root->parse->hasAggs || \
+ root->hasHavingQual)
/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
--
2.25.1
v23-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchapplication/octet-stream; name=v23-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patchDownload
From 23a7b8e8ad4871e503adaec038ba025bde76a0a9 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 16 Jan 2021 12:58:25 +0530
Subject: [PATCH v23 3/4] Tests And Docs For Parallel Inserts in CTAS
---
doc/src/sgml/ref/create_table_as.sgml | 31 +-
src/test/regress/expected/write_parallel.out | 535 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 206 +++++++
3 files changed, 767 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 07558ab56c..35903701ed 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -37,11 +37,13 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
<para>
<command>CREATE TABLE AS</command> creates a table and fills it
- with data computed by a <command>SELECT</command> command.
- The table columns have the
- names and data types associated with the output columns of the
- <command>SELECT</command> (except that you can override the column
- names by giving an explicit list of new column names).
+ with data computed by a <command>SELECT</command> command. When the
+ node at the top of the <command>SELECT</command> plan is
+ <literal>Gather</literal> and there are no projections to be performed
+ by it, then the created table can be filled by the workers in parallel.
+ The table columns have the names and data types associated with the output
+ columns of the <command>SELECT</command> (except that you can override the
+ column names by giving an explicit list of new column names).
</para>
<para>
@@ -297,6 +299,25 @@ PREPARE recentfilms(date) AS
CREATE TEMP TABLE films_recent ON COMMIT DROP AS
EXECUTE recentfilms('2002-01-01');
</programlisting></para>
+
+ <para>
+ Here is an example of a query plan when the data into the created table can
+ be filled by the workers in parallel:
+
+<programlisting>
+EXPLAIN CREATE TABLE bar AS SELECT * FROM foo WHERE i > 5;
+
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;--
+Gather (cost=0.00..23.28 rows=850 width=4)
+ Workers Planned: 2
+ -> Create bar
+ -> Parallel Seq Scan on foo (cost=0.00..23.28 rows=354 width=4)
+ Filter: (i > 5)
+(5 rows)
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..782b78b39e 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -76,4 +76,539 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ ln := regexp_replace(ln, 'Buckets: \d+', 'Buckets: N');
+ ln := regexp_replace(ln, 'Batches: \d+', 'Batches: N');
+ return next ln;
+ end loop;
+end;
+$$;
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(4 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+ explain_pictas
+-------------------------------------------------
+ LockRows (actual rows=N loops=N)
+ -> Seq Scan on tenk1 (actual rows=N loops=N)
+(2 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_mat_view
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_mat_view;
+ count
+-------
+ 10000
+(1 row)
+
+drop materialized view parallel_mat_view;
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+deallocate parallel_write_prep;
+drop table parallel_write;
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+ explain_pictas
+-------------------------------------------
+ Seq Scan on tenk1 (actual rows=N loops=N)
+(1 row)
+
+select count(*) from parallel_write;
+ count
+-------
+ 10000
+(1 row)
+
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+ explain_pictas
+------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+ Filter: (four = 3)
+ Rows Removed by Filter: N
+ SubPlan 1
+ -> Function Scan on generate_series (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 2500
+(1 row)
+
+drop table parallel_write;
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+ explain_pictas
+----------------------------------------------------------------
+ Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+ explain_pictas
+----------------------------------------------------------------
+ Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+ explain_pictas
+----------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: N Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(7 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+ explain_pictas
+----------------------------------------------------------------------
+ Finalize HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: N Memory Usage: NkB
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Partial HashAggregate (actual rows=N loops=N)
+ Group Key: col1
+ Batches: N Memory Usage: NkB
+ Worker 0: Batches: N Memory Usage: NkB
+ Worker 1: Batches: N Memory Usage: NkB
+ Worker 2: Batches: N Memory Usage: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(13 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+ explain_pictas
+----------------------------------------------------------------------------------
+ Finalize GroupAggregate (actual rows=N loops=N)
+ Group Key: ($1)
+ Filter: (count(temp1.col1) > 0)
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp3 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Partial GroupAggregate (actual rows=N loops=N)
+ Group Key: $1
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: N Batches: N Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 1
+(1 row)
+
+drop table parallel_write;
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+ explain_pictas
+----------------------------------------------------------------------
+ WindowAgg (actual rows=N loops=N)
+ -> Gather Merge (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Sort (actual rows=N loops=N)
+ Sort Key: col1
+ Sort Method: quicksort Memory: NkB
+ Worker 0: Sort Method: quicksort Memory: NkB
+ Worker 1: Sort Method: quicksort Memory: NkB
+ Worker 2: Sort Method: quicksort Memory: NkB
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+(11 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Nested Loop (actual rows=N loops=N)
+ Join Filter: (temp1.col1 = temp2.col2)
+ Rows Removed by Join Filter: 20
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Materialize (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_hashjoin to on;
+set enable_nestloop to off;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+ explain_pictas
+----------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Hash Join (actual rows=N loops=N)
+ Hash Cond: (temp1.col1 = temp2.col2)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Hash (actual rows=N loops=N)
+ Buckets: N Batches: N Memory Usage: NkB
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..8ed8a5049b 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -39,4 +39,210 @@ explain (costs off) create table parallel_write as execute prep_stmt;
create table parallel_write as execute prep_stmt;
drop table parallel_write;
+--
+-- Test parallel inserts in create table as/select into/create materialized
+-- view.
+--
+
+-- Parallel queries won't necessarily get as many workers as the planner
+-- asked for. This affects not only the "Workers Launched:" field of EXPLAIN
+-- results, but also row counts and loop counts for parallel scans, Gathers,
+-- and everything in between. This function filters out the values we can't
+-- rely on to be stable.
+-- This removes enough info that you might wonder why bother with EXPLAIN
+-- ANALYZE at all. The answer is that we need to see whether the parallel
+-- inserts are being done by the workers, the only way is that
+-- Create <<tbl_name>> appears in the explain output.
+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ ln := regexp_replace(ln, 'Buckets: \d+', 'Buckets: N');
+ ln := regexp_replace(ln, 'Batches: \d+', 'Batches: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+-- parallel inserts must occur as the CTAS creates a normal table
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+-- check if the parallel insertions have happened within the same xact, if yes,
+-- there should be a single cmin and xmin i.e. below query should output 1
+select count(*) from (select distinct cmin, xmin from parallel_write) as dt;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the CTAS creates an unlogged table
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates a normal table
+select explain_pictas(
+'select length(stringu1) into parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'select length(stringu1) into temporary parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the select into creates an unlogged table
+select explain_pictas(
+'select length(stringu1) into unlogged parallel_write from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of for update clause
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1 for update;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must occur as the materialized view is being created here
+select explain_pictas(
+'create materialized view parallel_mat_view as
+ select length(stringu1) from tenk1;');
+select count(*) from parallel_mat_view;
+drop materialized view parallel_mat_view;
+
+-- parallel inserts must occur as the CTAS creates the table using prepared
+-- statement for which parallelism would have been picked
+prepare parallel_write_prep as select length(stringu1) from tenk1;
+select explain_pictas(
+'create table parallel_write as execute parallel_write_prep;');
+select count(*) from parallel_write;
+deallocate parallel_write_prep;
+drop table parallel_write;
+
+-- parallel inserts must not occur as the parallelism will not be picked
+-- for select part because of the parallel unsafe function
+create sequence parallel_write_sequence;
+select explain_pictas(
+E'create table parallel_write as
+ select nextval(\'parallel_write_sequence\'), four from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+drop sequence parallel_write_sequence;
+
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+create table temp1(col1) as select * from generate_series(1,5);
+create table temp2(col2) as select * from temp1;
+create table temp3(col3) as select * from temp1;
+
+-- parallel inserts must not occur, as there is a limit clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 limit 4;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an order by clause
+select explain_pictas(
+'create table parallel_write as select * from temp1 order by 1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a distinct clause
+select explain_pictas(
+'create table parallel_write as select distinct * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate and group clause
+select explain_pictas(
+'create table parallel_write as select count(*) from temp1 group by col1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is an aggregate, group and having
+-- clauses
+select explain_pictas(
+'create table parallel_write as
+ select count(col1), (select col3 from
+ (select * from temp3) as tt limit 1) col4 from temp1, temp2
+ where temp1.col1 = temp2.col2 group by col4 having count(col1) > 0;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel inserts must not occur, as there is a window function
+select explain_pictas(
+'create table parallel_write as
+ select avg(col1) OVER (PARTITION BY col1) from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- nested loop join is the top node under which Gather node exists, so parallel
+-- inserts must not occur
+set enable_nestloop to on;
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_hashjoin to on;
+set enable_nestloop to off;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1, temp2 where temp1.col1 = temp2.col2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+reset enable_nestloop;
+reset enable_mergejoin;
+reset enable_hashjoin;
+
+drop table temp1;
+drop table temp2;
+drop table temp3;
+drop function explain_pictas(text);
rollback;
--
2.25.1
v23-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchapplication/octet-stream; name=v23-0004-Enable-CTAS-Parallel-Inserts-For-Append.patchDownload
From 4b50d5f69b56084066108fed7d4ac31ff9f48685 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 16 Jan 2021 13:08:26 +0530
Subject: [PATCH v23 4/4] Enable CTAS Parallel Inserts For Append
This patch allows pushing down the CTAS dest receiver even if there
exists Gather node under Top Append node. It also add the code for
influencing the planner to consider parallel tuple cost zero and
asserts for wrong enforcement if later parallel insertion is not
possible. Test cases are also included in this patch.
---
src/backend/executor/execParallel.c | 154 ++--
src/backend/optimizer/path/allpaths.c | 31 +
src/backend/optimizer/plan/planner.c | 12 +-
src/include/executor/execParallel.h | 4 +-
src/test/regress/expected/write_parallel.out | 722 +++++++++++++++++++
src/test/regress/sql/write_parallel.sql | 222 ++++++
6 files changed, 1089 insertions(+), 56 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a26c9cdac8..63fec33e80 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -151,6 +151,9 @@ static void SaveParallelInsCmdFixedInfo(ParallelExecutorInfo *pei,
static void SaveParallelInsCmdInfo(ParallelContext *pcxt,
ParallelInsertCmdKind ins_cmd,
void *ins_info);
+static bool PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd,
+ bool *gather_exists);
/* Helper functions that run in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
@@ -1748,6 +1751,84 @@ IsParallelInsertionAllowed(ParallelInsertCmdKind ins_cmd, void *ins_info)
return false;
}
+/*
+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node and if it does not have any projections to do.
+ * Required information from the pushed dest receiver is sent to workers so
+ * that they can perform parallel insertions into the target table.
+ *
+ * If the top node is Append, then this function recursively checks the sub
+ * plans for Gather nodes, when found one(and if it does not have projections),
+ * then sets the dest receiver information.
+ *
+ * In this function we only care about Append and Gather nodes. This function
+ * returns true if at least one Gather node can allow parallel insertions by
+ * the workers. Otherwise returns false. It also sets gather_exists to true if
+ * at least one Gather node exists.
+ */
+static bool
+PushDownParallelInsertState(DestReceiver *dest, PlanState *ps,
+ ParallelInsertCmdKind ins_cmd, bool *gather_exists)
+{
+ bool parallel = false;
+
+ if (ps == NULL)
+ return parallel;
+
+ if (IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+
+ for (int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownParallelInsertState(dest, aps->appendplans[i],
+ ins_cmd, gather_exists);
+ }
+ }
+ else if (IsA(ps, GatherState))
+ {
+ GatherState *gstate = (GatherState *) ps;
+
+ /*
+ * Set to true if there exists at least one Gather node either at the
+ * top of the plan or as a direct sub node under Append node.
+ */
+ *gather_exists |= true;
+
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ parallel = true;
+
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts in CTAS we must send information such
+ * as into clause (to build separate dest receiver), object id (to
+ * open the created table) to each workers. Since this information
+ * is available in the CTAS dest receiver, store a reference to it
+ * in the Gather state so that it will be used in
+ * ExecInitParallelPlan to pick the required information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = false;
+
+ gstate->dest = NULL;
+ }
+ }
+
+ return parallel;
+}
+
/*
* Set the parallel insert state, if the upper node is Gather and it doesn't
* have any projections. The parallel insert state includes information such as
@@ -1758,66 +1839,35 @@ void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc,
uint8 *tuple_cost_opts)
{
- GatherState *gstate;
- DestReceiver *dest;
+ bool allow = false;
+ bool gather_exists = false;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
- gstate = (GatherState *) queryDesc->planstate;
- dest = queryDesc->dest;
+ allow = PushDownParallelInsertState(queryDesc->dest, queryDesc->planstate,
+ ins_cmd, &gather_exists);
/*
- * Parallel insertions are possible only if the upper node is Gather.
+ * If parallel insertion is allowed or not allowed due to non-existence of
+ * Gather node, then return from here without going for below assertion
+ * check.
*/
- if (!IsA(gstate, GatherState))
+ if (allow || !gather_exists)
return;
/*
- * Parallelize inserts only when the upper Gather node has no projections.
+ * When parallel insertion is not allowed but Gather node exists, before
+ * returning ensure that we have not done wrong parallel tuple cost
+ * enforcement in the planner. Main reason for this assertion is to check
+ * if we enforced the planner to ignore the parallel tuple cost (with the
+ * intention of choosing parallel inserts) due to which the parallel plan
+ * may have been chosen, but we do not allow the parallel inserts now.
+ *
+ * If we have correctly ignored parallel tuple cost in the planner while
+ * creating Gather path, then this assertion failure should not occur. In
+ * case it occurs, that means the planner may have chosen this parallel
+ * plan because of our wrong enforcement. So let's try to catch that here.
*/
- if (!gstate->ps.ps_ProjInfo)
- {
- /* Okay to parallelize inserts, so mark it. */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = true;
-
- /*
- * For parallelizing inserts, we must send some information so that the
- * workers can build their own dest receivers. For CTAS, this info is
- * into clause, object id (to open the created table).
- *
- * Since the required information is available in the dest receiver,
- * store a reference to it in the Gather state so that it will be used
- * in ExecInitParallelPlan to pick the information.
- */
- gstate->dest = dest;
- }
- else
- {
- /*
- * Upper Gather node has projections, so parallel insertions are not
- * allowed.
- */
- if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
- ((DR_intorel *) dest)->is_parallel = false;
-
- gstate->dest = NULL;
-
- /*
- * Before returning, ensure that we have not done wrong parallel tuple
- * cost enforcement in the planner. Main reason for this assertion is
- * to check if we enforced the planner to ignore the parallel tuple
- * cost (with the intention of choosing parallel inserts) due to which
- * the parallel plan may have been chosen, but we do not allow the
- * parallel inserts now.
- *
- * If we have correctly ignored parallel tuple cost in the planner
- * while creating Gather path, then this assertion failure should not
- * occur. In case it occurs, that means the planner may have chosen
- * this parallel plan because of our wrong enforcement. So let's try to
- * catch that here.
- */
- Assert(tuple_cost_opts && !(*tuple_cost_opts &
- PARALLEL_INSERT_TUP_COST_IGNORED));
- }
+ Assert(!allow && gather_exists && tuple_cost_opts && !(*tuple_cost_opts &
+ PARALLEL_INSERT_TUP_COST_IGNORED));
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 026a4b0848..96b5ce81c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,6 +23,7 @@
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "executor/execParallel.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -1103,6 +1104,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (root->glob->parallelModeOK && rel->consider_parallel)
set_rel_consider_parallel(root, childrel, childRTE);
+ /*
+ * When subplan is subquery, It's possible to do parallel insert if top
+ * node of subquery is Gather, so we turn on a flag to ignore parallel
+ * tuple cost in cost_gather if the SELECT is for CTAS.
+ */
+ if (childrel->rtekind == RTE_SUBQUERY)
+ {
+ /*
+ * When there is no parent path generating clause(such as limit,
+ * sort, distinct...), we can turn on the flag for two cases:
+ * i) query_level is 1
+ * ii) query_level > 1 then turn on the flag in the parent_root.
+ * The case ii) is to check append under append:
+ * Append
+ * ->Append
+ * ->Gather
+ * ->Other plan
+ */
+ if (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY &&
+ (root->query_level == 1 ||
+ root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND) &&
+ !(HAS_PARENT_PATH_GENERATING_CLAUSE(root)))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND;
+ }
+ }
+
/*
* Compute the child's size.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d1b7347de2..423619735b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7351,9 +7351,15 @@ can_partial_agg(PlannerInfo *root)
static void
ignore_parallel_tuple_cost(PlannerInfo *root)
{
- if (root->query_level == 1 &&
- (root->parse->parallelInsCmdTupleCostOpt &
- PARALLEL_INSERT_SELECT_QUERY))
+ if (root->query_level != 1 &&
+ (root->parent_root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND))
+ {
+ root->parse->parallelInsCmdTupleCostOpt |=
+ PARALLEL_INSERT_SELECT_QUERY;
+ }
+
+ if (root->parse->parallelInsCmdTupleCostOpt & PARALLEL_INSERT_SELECT_QUERY)
{
/*
* In each of the HAS_PARENT_PATH_GENERATING_CLAUSE cases, a parent
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f76b5c2ffd..41f116bbf5 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -65,7 +65,9 @@ typedef enum ParallelInsertCmdTupleCostOpt
*/
PARALLEL_INSERT_CAN_IGN_TUP_COST = 1 << 1,
/* Turn on this after the cost is ignored. */
- PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2
+ PARALLEL_INSERT_TUP_COST_IGNORED = 1 << 2,
+ /* Turn on this in case tuple cost needs to be ignored for Append cases. */
+ PARALLEL_INSERT_CAN_IGN_TUP_COST_APPEND = 1 << 3
} ParallelInsertCmdTupleCostOpt;
/*
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 782b78b39e..88401d3609 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -607,6 +607,728 @@ drop table parallel_write;
reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+----------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+ explain_pictas
+--------------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1, $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $1)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ InitPlan 2 (returns $3)
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(21 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+(22 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 7
+(1 row)
+
+drop table parallel_write;
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $1)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 = $2)
+ Rows Removed by Filter: N
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+(26 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+ explain_pictas
+--------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Seq Scan on temp1 (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ SubPlan 1
+ -> Limit (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+(10 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+ explain_pictas
+------------------------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on temp2 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+(8 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 15
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = $4)
+ Rows Removed by Filter: N
+(47 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $2
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ -> Parallel Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $5
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ -> Parallel Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(67 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+ explain_pictas
+----------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Seq Scan on temp2 temp2_1 (actual rows=N loops=N)
+ Filter: (col2 = $3)
+ Rows Removed by Filter: N
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $4)
+ Rows Removed by Filter: N
+(37 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+ explain_pictas
+------------------------------------------------------------------------------------
+ Append (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $0, $1
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 2 (returns $1)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($1 = $0)
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ Filter: (col1 = $0)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($2 = $0)
+ InitPlan 3 (returns $2)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 (never executed)
+ Filter: (col2 = $0)
+ -> Append (actual rows=N loops=N)
+ InitPlan 4 (returns $3)
+ -> Result (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Params Evaluated: $3, $4
+ Workers Launched: N
+ -> Create parallel_write
+ InitPlan 5 (returns $4)
+ -> Result (actual rows=N loops=N)
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($4 = $3)
+ -> Parallel Seq Scan on temp1 temp1_1 (actual rows=N loops=N)
+ Filter: (col1 = $3)
+ Rows Removed by Filter: N
+ -> Result (actual rows=N loops=N)
+ One-Time Filter: ($5 = $3)
+ InitPlan 6 (returns $5)
+ -> Result (actual rows=N loops=N)
+ -> Seq Scan on temp2 temp2_1 (never executed)
+ Filter: (col2 = $3)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on temp1 temp1_2 (actual rows=N loops=N)
+ Filter: (col1 = 5)
+ Rows Removed by Filter: N
+ -> Seq Scan on temp2 temp2_2 (actual rows=N loops=N)
+ Filter: (col2 = 5)
+ Rows Removed by Filter: N
+(53 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 4
+(1 row)
+
+drop table parallel_write;
+alter table temp2 reset (parallel_workers);
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------
+ HashAggregate (actual rows=N loops=N)
+ Group Key: temp1.col1
+ Batches: N Memory Usage: NkB
+ -> Append (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Except All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+ Filter: (col2 < 3)
+ Rows Removed by Filter: N
+(14 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 3
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+ explain_pictas
+----------------------------------------------------------------------------
+ HashSetOp Intersect All (actual rows=N loops=N)
+ -> Append (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 1" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp1 (actual rows=N loops=N)
+ -> Subquery Scan on "*SELECT* 2" (actual rows=N loops=N)
+ -> Gather (actual rows=N loops=N)
+ Workers Planned: 3
+ Workers Launched: N
+ -> Parallel Seq Scan on temp2 (actual rows=N loops=N)
+(12 rows)
+
+select count(*) from parallel_write;
+ count
+-------
+ 5
+(1 row)
+
+drop table parallel_write;
drop table temp1;
drop table temp2;
drop table temp3;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 8ed8a5049b..dac439dcbb 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -241,6 +241,228 @@ reset enable_nestloop;
reset enable_mergejoin;
reset enable_hashjoin;
+-- test cases for performing parallel inserts when Append node is at the top
+-- and Gather node is in one of its direct sub plans.
+
+-- case 1: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 2: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select (select col2 from temp2 limit 1) col2 from temp1 union all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 3: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples to
+-- Append and from there to CTAS dest receiver.
+-- Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+-- ->Parallel Seq Scan
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 where col1 = 5 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1 union all
+ select * from temp1 where col1 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 set (parallel_workers = 0);
+select explain_pictas(
+'create table parallel_write as select * from temp1 where col1 = (select 1) union all
+ select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+alter table temp2 reset (parallel_workers);
+drop table parallel_write;
+
+-- case 4: parallel inserts must not occur as there will be no direct Gather
+-- node under Append node. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Seq Scan / Join / any other non-Gather node
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select (select temp1.col1 from temp2 limit 1) col2 from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 5: parallel inserts must occur at the top Gather node as we can push
+-- the CTAS dest receiver to it.
+-- Gather
+-- ->Parallel Append
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Parallel Seq Scan
+-- ->Parallel Seq Scan
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union all
+ select * from temp2 union all
+ select * from temp1;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 6: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Gather
+-- ->Gather
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp2 where col2 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+-- case 7: parallel inserts must occur at each Gather node as we can push the
+-- CTAS dest receiver. Non-Gather nodes will do inserts by sending tuples
+-- to Append and from there to CTAS dest receiver.
+-- Append
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Append
+-- ->Gather
+-- ->Parallel Seq Scan
+-- ->Seq Scan / Join / any other non-Gather node
+-- ->Gather
+
+alter table temp2 set (parallel_workers = 0);
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2) union all
+ select * from temp1 where col1 = (select 2);');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from (select * from temp1 where col1 = (select 1) union all
+ select * from temp2 where col2 = (select 2)) as tt
+ where col1 = (select 1) union all
+ select * from temp1 where col1 = 5 union all
+ select * from temp2 where col2 = 5;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+alter table temp2 reset (parallel_workers);
+
+-- case 8: parallel inserts must not occur because there is no Gather or Append
+-- node at the top for union, except/except all, intersect/intersect all
+-- cases.
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 union
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 except all
+ select * from temp2 where col2 < 3;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
+select explain_pictas(
+'create table parallel_write as
+ select * from temp1 intersect all
+ select * from temp2;');
+select count(*) from parallel_write;
+drop table parallel_write;
+
drop table temp1;
drop table temp2;
drop table temp3;
--
2.25.1
Hi Bharath,
I'm trying to take some performance measurements on you patch v23.
But when I started, I found an issue about the tuples unbalance distribution among workers(99% tuples read by one worker) under specified case which lead the "parallel select" part makes no performance gain.
Then I find it's not introduced by your patch, because it's also happening in master(HEAD). But I don't know how to deal with it , so I put it here to see if anybody know what's going wrong with this or have good ideas to deal this issue.
Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also happen but not so obvious(40% tuples on one worker in my tests).
2. query plan is "serial insert + parallel select", I have reproduce this behavior in (CTAS, Select into, insert into select).
3. select part needs to query large data size(e.g. query 100 million from 200 million).
According to above, IMHO, I guess it may be caused by the leader write rate can't catch the worker read rate, then the tuples of one worker blocked in the queue, become more and more.
Below is my test info:
1. test spec environment
CentOS 8.2, 128G RAM, 40 processors, disk SAS
2. test data prepare
create table x(a int, b int, c int);
create index on x(a);
insert into x select generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
3. test execute results
*Patched CTAS*: please look at worker 2, 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.203..24023.686 rows=100006268 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19998746
Worker 0: actual time=0.016..19.265 rows=94592 loops=1
Worker 1: actual time=0.027..31.422 rows=94574 loops=1
Worker 2: actual time=0.014..21744.549 rows=99627749 loops=1
Worker 3: actual time=0.015..19.347 rows=94586 loops=1 Planning Time: 0.098 ms Execution Time: 91054.828 ms
*Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.283..19216.157 rows=100003148 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19999370
Worker 0: actual time=0.013..21805.647 rows=99624833 loops=1
Worker 1: actual time=0.016..19.790 rows=94398 loops=1
Worker 2: actual time=0.013..35.340 rows=94423 loops=1
Worker 3: actual time=0.035..19.849 rows=94679 loops=1 Planning Time: 0.083 ms Execution Time: 91151.097 ms
I'm still working on the performance tests on your patch, if I make some progress, I will post my results here.
Regards,
Tang
On Fri, Jan 22, 2021 at 5:16 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
wrote:
Hi Bharath,
I'm trying to take some performance measurements on you patch v23.
But when I started, I found an issue about the tuples unbalance
distribution among workers(99% tuples read by one worker) under specified
case which lead the "parallel select" part makes no performance gain.
Then I find it's not introduced by your patch, because it's also
happening in master(HEAD). But I don't know how to deal with it , so I put
it here to see if anybody know what's going wrong with this or have good
ideas to deal this issue.
Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it
also happen but not so obvious(40% tuples on one worker in my tests).
2. query plan is "serial insert + parallel select", I have reproduce this
behavior in (CTAS, Select into, insert into select).
3. select part needs to query large data size(e.g. query 100 million from
200 million).
According to above, IMHO, I guess it may be caused by the leader write
rate can't catch the worker read rate, then the tuples of one worker
blocked in the queue, become more and more.
Below is my test info:
1. test spec environment
CentOS 8.2, 128G RAM, 40 processors, disk SAS2. test data prepare
create table x(a int, b int, c int);
create index on x(a);
insert into x select
generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
3. test execute results
*Patched CTAS*: please look at worker 2, 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.203..24023.686 rows=100006268 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19998746
Worker 0: actual time=0.016..19.265 rows=94592 loops=1
Worker 1: actual time=0.027..31.422 rows=94574 loops=1
Worker 2: actual time=0.014..21744.549 rows=99627749 loops=1
Worker 3: actual time=0.015..19.347 rows=94586 loops=1
Planning Time: 0.098 ms Execution Time: 91054.828 ms
*Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.283..19216.157 rows=100003148 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19999370
Worker 0: actual time=0.013..21805.647 rows=99624833 loops=1
Worker 1: actual time=0.016..19.790 rows=94398 loops=1
Worker 2: actual time=0.013..35.340 rows=94423 loops=1
Worker 3: actual time=0.035..19.849 rows=94679 loops=1
Planning Time: 0.083 ms Execution Time: 91151.097 ms
I'm still working on the performance tests on your patch, if I make some
progress, I will post my results here.
Thanks a lot for the tests. In your test case, parallel insertions are not
being picked because the Gather node has some projections(floor(((random()
* '10000'::double precision) + '1'::double precision)) to perform. That's
expected. Whenever parallel insertions are chosen for CTAS, we should see
"Create target_table '' under Gather node [1]I did this test on my development system, I will run on some performance system and post my observations. postgres=# explain (analyze, verbose) create table test(a,b,c) as select a,b,c from x where b%2=0; QUERY PLAN and also the actual row count
for Gather node 0 (but in your test it is rows=100006268) in the explain
analyze output. Coming to your test case, if it's modified to something
like [1]I did this test on my development system, I will run on some performance system and post my observations. postgres=# explain (analyze, verbose) create table test(a,b,c) as select a,b,c from x where b%2=0; QUERY PLAN, where the Gather node has no projections, then parallel
insertions will be chosen.
[1]: I did this test on my development system, I will run on some performance system and post my observations. postgres=# explain (analyze, verbose) create table test(a,b,c) as select a,b,c from x where b%2=0; QUERY PLAN
performance system and post my observations.
postgres=# explain (analyze, verbose) create table test(a,b,c) as select
a,b,c from x where b%2=0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..3846.71 rows=1000 width=12) (actual
time=5581.308..5581.379 rows=0 loops=1)
Output: a, b, c
Workers Planned: 1
Workers Launched: 1
* -> Create test*
-> Parallel Seq Scan on public.x (cost=0.00..2846.71 rows=588
width=12) (actual time=0.014..29.512 rows=50023 loops=2)
Output: a, b, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 49977
Worker 0: actual time=0.015..29.751 rows=49419 loops=1
Planning Time: 1574.584 ms
Execution Time: 6437.562 ms
(12 rows)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Show quoted text
Thanks a lot for the tests. In your test case, parallel insertions are not being picked because the Gather node has
some projections(floor(((random() * '10000'::double precision) + >'1'::double precision)) to perform. That's expected.
Whenever parallel insertions are chosen for CTAS, we should see "Create target_table '' under Gather node [1] and
also the actual >row count for Gather node 0 (but in your test it is rows=100006268) in the explain analyze output.
Coming to your test case, if it's modified to something like [1], where the Gather >node has no projections,
then parallel insertions will be chosen.
Thanks for your explanation and test.
Actually, I deliberately made my test case(with projection) to pick serial insert to make tuples unbalance distribution(99% tuples read by one worker) happened.
This issue will lead the performance regression.
But it's not introduced by your patch, it’s happening in master(HEAD).
Do you have some thoughts about this.
[1] - I did this test on my development system, I will run on some performance system and post my observations.
Thank you, It will be very kind of you to do this.
To reproduce above issue, you need to use my case(with projection). Because it won’t occur in “parallel insert”.
Regards,
Tang
Hi Bharath,
I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.
Most of the tests execution become faster with this patch.
However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table size(2/10/20 millions), they all have a 6%-10% declines. I think it may need some check at this problem.
Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
reg%=(patched-master)/master
Test NO | Test Case |reg% | patched(ms) | master(ms)
--------|--------------------------------|------|--------------|-------------
1 | CTAS select from table | -9% | 16709.50477 | 18370.76660
2 | Append plan | -14% | 16542.97807 | 19305.86600
3 | initial plan under Gather node| -5% | 13374.27187 | 14120.02633
4 | CTAS table | 10% | 20835.48800 | 18986.40350
5 | CTAS select from execute | -6% | 16973.73890 | 18008.59789
About Test NO 4:
In master(HEAD), this test case picks serial seq scan.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tenk1 (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Planning Time: 0.053 ms Execution Time: 20165.023 ms
With this patch, it will choose parallel seq scan and parallel insert.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Workers Planned: 4
Workers Launched: 4
-> Create test
-> Parallel Seq Scan on public.tenk1 (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094 rows=2000000 loops=5)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Worker 0: actual time=0.023..390.856 rows=1858407 loops=1
Worker 1: actual time=0.024..468.587 rows=2264494 loops=1
Worker 2: actual time=0.023..473.170 rows=2286580 loops=1
Worker 3: actual time=0.027..373.727 rows=1853216 loops=1 Planning Time: 0.053 ms Execution Time: 20437.643 ms
test machine spec:
CentOS 8.2, 128G RAM, 40 processors, disk SAS
Regards,
Tang
Attachments:
On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.
Most of the tests execution become faster with this patch.
However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table size(2/10/20 millions), they all have a 6%-10% declines. I think it may need some check at this problem.
Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
reg%=(patched-master)/master
Test NO | Test Case |reg% | patched(ms) | master(ms)
--------|--------------------------------|------|--------------|-------------
1 | CTAS select from table | -9% | 16709.50477 | 18370.76660
2 | Append plan | -14% | 16542.97807 | 19305.86600
3 | initial plan under Gather node| -5% | 13374.27187 | 14120.02633
4 | CTAS table | 10% | 20835.48800 | 18986.40350
5 | CTAS select from execute | -6% | 16973.73890 | 18008.59789
About Test NO 4:
In master(HEAD), this test case picks serial seq scan.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tenk1 (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Planning Time: 0.053 ms Execution Time: 20165.023 ms
With this patch, it will choose parallel seq scan and parallel insert.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Workers Planned: 4
Workers Launched: 4
-> Create test
-> Parallel Seq Scan on public.tenk1 (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094 rows=2000000 loops=5)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Worker 0: actual time=0.023..390.856 rows=1858407 loops=1
Worker 1: actual time=0.024..468.587 rows=2264494 loops=1
Worker 2: actual time=0.023..473.170 rows=2286580 loops=1
Worker 3: actual time=0.027..373.727 rows=1853216 loops=1 Planning Time: 0.053 ms Execution Time: 20437.643 ms
test machine spec:
CentOS 8.2, 128G RAM, 40 processors, disk SAS
Thanks a lot for the performance tests and test cases. I will analyze
why the performance is degrading one case and respond soon.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Seems like v22 patch was failing in cfbot for one of the unstable test cases.
Attaching v23 patch set with modification in 0003 and 0004 patches. No
changes to 0001 and 0002 patches. Hopefully cfbot will be happy with v23.Please consider v23 for further review.
Hi,
I was looking into the latest patch, here are some comments:
1)
* (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
Since we will support parallel insert with CTAS, this existing comments need to be changed.
2)
In parallel.sgml
The query writes any data or locks any database rows. If a query
contains a data-modifying operation either at the top level or within
a CTE, no parallel plans for that query will be generated. As an
exception, the commands <literal>CREATE TABLE ... AS</literal>,
The same as 1), we'd better comment we have support parallel insert with CTAS.
3)
ExecInitParallelPlan(PlanState *planstate, EState *estate,
Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed,
+ ParallelInsertCmdKind parallel_ins_cmd,
+ void *parallel_ins_info)
Is it better to place ParallelInsertCmdKind in struct ParallelInsertCTASInfo?
We can pass less parameter in some places.
Like:
typedef struct ParallelInsertCTASInfo
{
+ ParallelInsertCmdKind parallel_ins_cmd;
IntoClause *intoclause;
Oid objectid;
} ParallelInsertCTASInfo;
Best regards,
houzj
Seems like v22 patch was failing in cfbot for one of the unstable test cases.
Attaching v23 patch set with modification in 0003 and 0004 patches. No
changes to 0001 and 0002 patches. Hopefully cfbot will be happy with v23.Please consider v23 for further review.
Hi,
I was looking into the latest patch, here are some comments:
Few comments.
1)
Executing the following SQL will cause assertion failure.
-----------sql---------------
create table data(a int);
insert into data select 1 from generate_series(1,1000000,1) t;
explain (verbose) create table tt as select a,2 from data;
--------------------------
The stack message:
-----------stack---------------
TRAP: FailedAssertion("!allow && gather_exists && tuple_cost_opts && !(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED)", File: "execParallel.c", Line: 1872, PID: 1618247)
postgres: houzj postgres [local] EXPLAIN(ExceptionalCondition+0x8b)[0x940f0b]
postgres: houzj postgres [local] EXPLAIN[0x67ba1c]
postgres: houzj postgres [local] EXPLAIN(ExplainOnePlan+0x1c2)[0x605997]
postgres: houzj postgres [local] EXPLAIN[0x605d11]
postgres: houzj postgres [local] EXPLAIN(ExplainOneUtility+0x162)[0x605eb0]
--------------------------
In this case, The Gather node have projection in which case parallel CTAS is not supported,
but we still ignore the cost in planner.
If we plan to detect the projection, we may need to check tlist_same_exprs.
+ if (tlist_same_exprs)
+ {
ignore_parallel_tuple_cost(root);
+ }
generate_useful_gather_paths(root, rel, false);
2)
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)
IMO, It's better to add some comments about why we do not support projection for now.
Because, not all the projection are parallel unsafe (such as the case in 1) ), it will be desirable to support these later.
3)
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
...
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
...
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ ¶llel_ins_info))
Currently, the patch call IsParallelInsertionAllowed() before and after pg_plan_query(),
This might lead to a misunderstanding that parallel_ins_info will get changed during pg_plan_query().
Since parallel_ins_info will not get changed in pg_plan_query, is it possible to add a bool flag(allowed)
in parallel_ins_info to avoid the second call of IsParallelInsertionAllowed ?
Best regards,
houzj
On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.
Most of the tests execution become faster with this patch.
However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table size(2/10/20 millions), they all have a 6%-10% declines. I think it may need some check at this problem.
Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
reg%=(patched-master)/master
Test NO | Test Case |reg% | patched(ms) | master(ms)
--------|--------------------------------|------|--------------|-------------
1 | CTAS select from table | -9% | 16709.50477 | 18370.76660
2 | Append plan | -14% | 16542.97807 | 19305.86600
3 | initial plan under Gather node| -5% | 13374.27187 | 14120.02633
4 | CTAS table | 10% | 20835.48800 | 18986.40350
5 | CTAS select from execute | -6% | 16973.73890 | 18008.59789
About Test NO 4:
In master(HEAD), this test case picks serial seq scan.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tenk1 (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Planning Time: 0.053 ms Execution Time: 20165.023 ms
With this patch, it will choose parallel seq scan and parallel insert.
query plan likes:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Workers Planned: 4
Workers Launched: 4
-> Create test
-> Parallel Seq Scan on public.tenk1 (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094 rows=2000000 loops=5)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
Worker 0: actual time=0.023..390.856 rows=1858407 loops=1
Worker 1: actual time=0.024..468.587 rows=2264494 loops=1
Worker 2: actual time=0.023..473.170 rows=2286580 loops=1
Worker 3: actual time=0.027..373.727 rows=1853216 loops=1 Planning Time: 0.053 ms Execution Time: 20437.643 ms
test machine spec:
CentOS 8.2, 128G RAM, 40 processors, disk SAS
Thanks a lot for the performance tests and test cases. I will analyze
why the performance is degrading one case and respond soon.
I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk. I saw no regression when I incremented(for testing purpose) the
rate at which the extra blocks are added in RelationAddExtraBlocks to
extraBlocks = Min(1024, lockWaiters * 512); (currently it is
extraBlocks = Min(512, lockWaiters * 20); Incrementing the extra
blocks addition rate is not a practical solution to this problem
though.
In an offlist discussion with Robert and Dilip, using fallocate to
extend the relation may help to extend the relation faster. In regards
to this, it looks like the AIO/DIO patch set of Andres [1]/messages/by-id/20210223100344.llw5an2aklengrmn@alap3.anarazel.de which
involves using fallocate() to extend files will surely be helpful.
Until then, we honestly feel that the parallel inserts in CTAS patch
set be put on hold and revive it later.
[1]: /messages/by-id/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts.
Thanks for the update.
BTW, May be you have some more testcases that can reproduce this regression easily.
Can you please share some of the testcase (with big tuple size) with me.
Regards,
Tang
On Fri, Mar 19, 2021 at 12:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts.Thanks for the update.
BTW, May be you have some more testcases that can reproduce this regression easily.
Can you please share some of the testcase (with big tuple size) with me.
They are pretty simple though. I think someone can also check if the
same regression exists for parallel inserts in "INSERT INTO SELECT"
patch set as well for larger tuple sizes.
[1]: DROP TABLE tenk1; CREATE UNLOGGED TABLE tenk1(c1 int, c2 int); INSERT INTO tenk1 values(generate_series(1,100000000), generate_series(1,100000000)); explain analyze verbose create table test as select * from tenk1;
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000));
explain analyze verbose create table test as select * from tenk1;
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4
varchar(8), c5 varchar(8));
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 varchar(8));
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,10000000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create unlogged table test as select * from tenk1;
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
They are pretty simple though. I think someone can also check if the same
regression exists for parallel inserts in "INSERT INTO SELECT"
patch set as well for larger tuple sizes.
Thanks for reminding.
I did some performance tests for parallel inserts in " INSERT INTO SELECT " with the testcase you provided,
the regression seems does not exists in "INSERT INTO SELECT".
I will try to test with larger tuple size later.
Best regards,
houzj
They are pretty simple though. I think someone can also check if the
same regression exists for parallel inserts in "INSERT INTO SELECT"
patch set as well for larger tuple sizes.Thanks for reminding.
I did some performance tests for parallel inserts in " INSERT INTO SELECT " with
the testcase you provided, the regression seems does not exists in "INSERT
INTO SELECT".
I forgot to share the test results with Parallel CTAS.
I test with sql: explain analyze verbose create table test as select * from tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4 varchar(8), c5 varchar(8));
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 varchar(8));
I did not see regression in these cases (low tuple size).
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, c14 name,
c15 name, c16 name, c17 name, c18 name);
I can see the degradation in this case.
The average test results of CTAS are:
Serial CTAS -----Execution Time: 80892.240 ms
Parallel CTAS -----Execution Time: 85725.591 ms
About 6% degradation.
I also test with Parallel INSERT patch in this case.
(Note: to keep consistent, I create a new target table(test) before inserting.)
The average test results of Parallel INSERT are:
Serial Parallel INSERT ------ Execution Time: 90075.501 ms
Parallel Parallel INSERT----- Execution Time: 85812.202 ms
No degradation.
Best regards,
houzj
On Fri, Mar 19, 2021 at 4:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
In an offlist discussion with Robert and Dilip, using fallocate to
extend the relation may help to extend the relation faster. In regards
to this, it looks like the AIO/DIO patch set of Andres [1] which
involves using fallocate() to extend files will surely be helpful.
Until then, we honestly feel that the parallel inserts in CTAS patch
set be put on hold and revive it later.
Hi,
I had partially reviewed some of the patches (first scan) when I was
alerted to your post and intention to put the patch on hold.
I thought I'd just post the comments I have so far, and you can look
at them at a later time when/if you revive the patch.
Patch 0001
1) Patch comment
Leader inserts its share of tuples if instructed to do, and so are workers
should be:
Leader inserts its share of tuples if instructed to, and so do the workers.
2)
void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
{
GatherState *gstate;
DestReceiver *dest;
Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
gstate = (GatherState *) queryDesc->planstate;
dest = queryDesc->dest;
/*
* Parallel insertions are not possible either if the upper node is not
* Gather or it's a Gather but it have some projections to perform.
*/
if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
return;
I think it would look better for code to be:
dest = queryDesc->dest;
/*
* Parallel insertions are not possible either if the upper node is not
* Gather or it's a Gather but it have some projections to perform.
*/
if (!IsA(queryDesc->planstate, GatherState) ||
queryDesc->planstate.ps_ProjInfo)
return;
gstate = (GatherState *) queryDesc->planstate;
3) src/backend/executor/execParallel.c
+ pg_atomic_uint64 processed;
I am wondering, when there is contention from multiple workers in
writing back their processed count, how well does this work? Any
performance issues?
For the Parallel INSERT patch (which has not yet been committed) it
currently uses an array of processed counts for the workers (since #
of workers is capped) so there is never any contention related to
this.
4) src/backend/executor/execParallel.c
You shouldn't use intermingled declarations and code.
https://www.postgresql.org/docs/13/source-conventions.html
Best to move the uninitialized variable declaration to the top of the block:
ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
int intoclause_len;
char *intoclause_space = NULL;
should be:
int intoclause_len;
ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
char *intoclause_space = NULL;
5) ExecParallelGetInsReceiver
Would look better to have:
DR_intorel *receiver;
receiver = (DR_intorel *)CreateIntoRelDestReceiver(intoclause);
receiver->is_parallel_worker = true;
receiver->object_id = fpes->objectid;
6) GetParallelInsertCmdType
I think the following would be better:
ParallelInsertCmdKind
GetParallelInsertCmdType(DestReceiver *dest)
{
if (dest &&
dest->mydest == DestIntoRel &&
((DR_intorel *) dest)->is_parallel)
return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;
return PARALLEL_INSERT_CMD_UNDEF;
}
7) IsParallelInsertAllowed
In the following code:
/* Below check may hit in case this function is called from explain.c. */
if (!(into && IsA(into, IntoClause)))
return false;
If "into" is non-NULL, isn't it guaranteed to point at an IntoClause?
I think the code can just be:
/* Below check may hit in case this function is called from explain.c. */
if (!into)
return false;
8) ExecGather
The comments and variable name are likely to cause confusion when the
parallel INSERT statement is implemented. Suggest minor change:
change:
bool perform_parallel_ins = false;
to:
bool perform_parallel_ins_no_readers = false;
change:
/*
* Do not create tuple queue readers for commands with parallel
* insertion. Because the gather node will not receive any
* tuples, the workers will insert the tuples into the target
* relation.
*/
to:
/*
* Do not create tuple queue readers for commands with parallel
* insertion that don't additionally return tuples. In this case,
* the workers will only insert the tuples into the target
* relation and the gather node will not receive any tuples.
*/
I think some changes in other areas are needed for the same reasons.
Patch 0002
1) I noticed that "rows" is not zero (and so is not displayed as 0 in
the EXPLAIN output for Gather) for the Gather node when parallel
inserts will be used. This doesn't seem to be right. I think that if
PARALLEL_INSERT_CAN_IGN_TUP_COST is set, path->rows should be set to
0, and just let existing "run_cost" be evaluated as normal (which will
be 0 as path->rows is 0).
2) Is PARALLEL_INSERT_TUP_COST_IGNORED actually needed? Couldn't only
PARALLEL_INSERT_CAN_IGN_TUP_COST be used for the purpose of ignoring
parallel tuple cost?
Regards,
Greg Nancarrow
Fujitsu Australia
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk.
How you have ensured that the cost is due to the flushing of pages?
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events. The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?
--
With Regards,
Amit Kapila.
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk.How you have ensured that the cost is due to the flushing of pages?
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events. The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?
Thanks! I will work on the above points sometime later.
BTW, I forgot to mention one point earlier that we see a benefit
without parallelism if only multi inserts are used for CTAS instead of
single inserts. See [2]case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples on master - 130sec on master with multi inserts - 105sec, gain - 1.23X on parallel CTAS patch without multi inserts - (2 workers, 82sec, 1.58X), (4 workers, 83sec, 1.56X) on parallel CTAS patch with multi inserts - (2 workers, 45sec, 2.33X, overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X, overall gain if seen from master 3.9X) for more testing results. I used "New Table
Access Methods for Multi and Single Inserts" patches from [1]/messages/by-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com for this
testing. I think it's a good idea to revisit that work.
[1]: /messages/by-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com
[2]: case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples on master - 130sec on master with multi inserts - 105sec, gain - 1.23X on parallel CTAS patch without multi inserts - (2 workers, 82sec, 1.58X), (4 workers, 83sec, 1.56X) on parallel CTAS patch with multi inserts - (2 workers, 45sec, 2.33X, overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X, overall gain if seen from master 3.9X)
case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
on master - 130sec
on master with multi inserts - 105sec, gain - 1.23X
on parallel CTAS patch without multi inserts - (2 workers, 82sec,
1.58X), (4 workers, 83sec, 1.56X)
on parallel CTAS patch with multi inserts - (2 workers, 45sec, 2.33X,
overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X,
overall gain if seen from master 3.9X)
case 2 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
59 bytes, 100mn tuples
on master - 185sec
on master with multi inserts - 121sec, gain - 1.52X
on parallel CTAS patch without multi inserts - (2 workers, 120sec,
1.54X), (4 workers, 123sec, 1.5X)
on parallel CTAS patch with multi inserts - (2 workers, 68sec, 1.77X,
overall gain if seen from master 2.72X), (4 workers, 61sec, 1.98X,
overall gain if seen from master 3.03X)
Above two cases are the best cases with tuple size a few bytes where
parallel CTAS + multi inserts would give up to 3.9X and 3.03X
benefits.
case 3 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
on master - 367sec
on master with multi inserts - 291sec, gain - 1.26X
on parallel CTAS patch without multi inserts - (2 workers, 334sec,
1.09X), (4 workers, 336sec, 1.09X)
on parallel CTAS patch with multi inserts - (2 workers, 284sec,
1.02X, overall gain if seen from master 1.29X), (4 workers, 278sec,
1.04X, overall gain if seen from master 1.32X)
Above case where tuple size is 241 bytes, we don't gain much.
case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
columns, tuple size 1064 bytes, 10mn tuples
on master - 120sec
on master with multi inserts - 115sec, gain - 1.04X
on parallel CTAS patch without multi inserts - (2 workers, 140sec,
0.85X), (4 workers, 142sec, 0.84X)
on parallel CTAS patch with multi inserts - (2 workers, 133sec,
0.86X, overall loss if seen from master 0.9X), (4 workers, 134sec,
0.85X, overall loss if seen from master 0.89X)
Above case where tuple size is 1064 bytes, we gain very little with
multi inserts and with parallel inserts we cause regression.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi Bharath-san,
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Friday, May 21, 2021 6:49 PM
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I analyzed performance of parallel inserts in CTAS for different
cases with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We
could gain if the tuple sizes are lower. But if the tuple size is
larger i..e 1064bytes, there's a regression with parallel inserts.
Upon further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently
extending the relation(in RelationAddExtraBlocks) and the majority
of the time spent is going into flushing those new empty
pages/blocks onto the disk.How you have ensured that the cost is due to the flushing of pages?
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events. The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?Thanks! I will work on the above points sometime later.
I noticed one place which could be one of the reasons that cause the performance degradation.
+ /*
+ * We don't need to skip contacting FSM while inserting tuples for
+ * parallel mode, while extending the relations, workers instead of
+ * blocking on a page while another worker is inserting, can check the
+ * FSM for another page that can accommodate the tuples. This results
+ * in major benefit for parallel inserts.
+ */
+ myState->ti_options = 0;
I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain.
In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation . Could you please have a try on it ?
(I test with the SQL you provided earlier[1]/messages/by-id/CALj2ACWFvNm4d_uqT2iECPqaXZjEd-O+y8xbghvqXeMLj0pxGw@mail.gmail.com)
[1]: /messages/by-id/CALj2ACWFvNm4d_uqT2iECPqaXZjEd-O+y8xbghvqXeMLj0pxGw@mail.gmail.com
Best regards,
houzj
Bharath-san, all,
Hmm, I didn't experience performance degradation on my poor-man's Linux VM (4 CPU, 4 GB RAM, HDD)...
[benchmark preparation]
autovacuum = off
shared_buffers = 1GB
checkpoint_timeout = 1h
max_wal_size = 8GB
min_wal_size = 8GB
(other settings to enable parallelism)
CREATE UNLOGGED TABLE a (c char(1100));
INSERT INTO a SELECT i FROM generate_series(1, 300000) i;
(the table size is 335 MB)
[benchmark]
CREATE TABLE b AS SELECT * FROM a;
DROP TABLE a;
CHECKPOINT;
(measure only CTAS)
[results]
parallel_leader_participation = off
workers time(ms)
0 3921
2 3290
4 3132
parallel_leader_participation = on
workers time(ms)
2 3266
4 3247
Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup. This is because I thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and flushing dirty buffers when the ring buffer is filled. Can we take advantage of this?
[GetBulkInsertState]
/* bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
bistate->strategy = NULL;
[results]
parallel_leader_participation = off
workers time(ms)
0 3695 (5% reduction)
2 3135 (4% reduction)
4 2767 (11% reduction)
Regards
Takayuki Tsunakawa
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
+ /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring
performance gain.
In my test environment, if I change this code to use option "
TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation.
+1, probably.
Does the code comment represent the situation like this?
1. Worker 1 is inserting into page 1.
2. Worker 2 tries to insert into page 1, but cannot acquire the buffer content lock of page 1 because worker 1 holds it.
3. Worker 2 looks up FSM to find a page with enough free space.
But isn't FSM still empty during CTAS?
Regards
Takayuki Tsunakawa
On Tue, May 25, 2021 at 12:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
I noticed one place which could be one of the reasons that cause the performance degradation.
+ /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain.
In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation . Could you please have a try on it ?
(I test with the SQL you provided earlier[1])
Thanks for trying that out.
Please see the code around the use_fsm flag in
RelationGetBufferForTuple for more understanding of the points below.
What happens if FSM is skipped i.e. myState->ti_options =
TABLE_INSERT_SKIP_FSM;?
1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker doesn't look in
FSM GetPageWithFreeSpace as use_fsm is false. It directly goes for
relation extension and tries to acquire relation extension lock with
LockRelationForExtension. Note that the bulk extension of blocks with
RelationAddExtraBlocks is not reached as use_fsm is false.
3) After acquiring the relation extension lock, it adds an extra new
block with ReadBufferBI(relation, P_NEW, ...), see the comment "In
addition to whatever extension we performed above, we always add at
least one block to satisfy our own request." The tuple is inserted
into this new block.
Basically, the workers can't look for the empty pages from the pages
added by other workers, they keep doing the above steps in silos.
What happens if FSM is not skipped i.e. myState->ti_options = 0;?
1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker looks for the
page with free space in FSM GetPageWithFreeSpace as use_fsm is true.
If it can't find any page with the required amount of free space, it
goes for bulk relation extension(RelationAddExtraBlocks) after
acquiring relation extension lock with
ConditionalLockRelationForExtension. Then the worker adds extraBlocks
= Min(512, lockWaiters * 20); new blocks in RelationAddExtraBlocks and
immediately updates the bottom level of FSM for each block (see the
comment around RecordPageWithFreeSpace for why only the bottom level,
not the entire FSM tree). After all the blocks are added, then it
updates the entire FSM tree FreeSpaceMapVacuumRange.
4) After the bulk extension, then the worker adds another block see
the comment "In addition to whatever extension we performed above, we
always add at least one block to satisfy our own request." and inserts
tuple into this new block.
Basically, the workers can benefit from the bulk extension of the
relation and they always can look for the empty pages from the pages
added by other workers. There are high chances that the blocks will be
available after bulk extension. Having said that, if the added extra
blocks are consumed by the workers so fast i.e. if the tuple sizes are
big i.e very less tuples per page, then, the bulk extension too can't
help much and there will be more contention on the relation extension
lock. Well, one might think to add more blocks at a time, say
Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
regression with parallel inserts in CTAS patches), but it can't be a
practical solution. Because the total pages for the relation will be
more with many pages having more free space. Furthermore, the future
sequential scans on that relation might take a lot of time.
If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the
place(within if (myState->is_parallel)), then it will be effective for
leader i.e. leader will not look for FSM, but all the workers will,
because within if (myState->is_parallel_worker) in intorel_startup,
myState->ti_options = 0; for workers.
I ran tests with configuration shown at [1]postgresql.conf parameters used: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off port = 5440 for the case 4 (2
bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns,
tuple size 1064 bytes, 10mn tuples) with leader participation where
I'm seeing regression:
1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader
and workers, then my results are as follows:
0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
2) when myState->ti_options = 0; for both leader and workers, then my
results are as follows:
0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409
I hope the above explanation and the test results should clarify the
fact that skipping FSM doesn't solve the problem. Let me know if
anything is not clear or I'm missing something.
[1]: postgresql.conf parameters used: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off port = 5440
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
port = 5440
System Configuration:
RAM: 528GB
Disk Type: SSD
Disk Size: 1.5TB
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
CPU max MHz: 2129.0000
CPU min MHz: 1064.0000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 25, 2021 at 1:10 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup. This is because I thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and flushing dirty buffers when the ring buffer is filled. Can we take advantage of this?
[GetBulkInsertState]
/* bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
bistate->strategy = NULL;
You are right. If ring buffer(16MB) is not used and shared
buffers(1GB) are used instead, in your case since the table size is
335MB and it can fit in the shared buffers, there will not be any or
will be very minimal dirty buffer flushing, so there will be more some
more speedup.
Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
is increased a bit from the current 16MB to some other reasonable
value. I earlier tried these experiments.
Otherwise, as I said in [1]/messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com, we can also increase the number of extra
blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
currently extraBlocks = Min(512, lockWaiters * 20);. This will also
give some speedup and we don't see any regression with parallel
inserts in CTAS patches.
But, I'm not so sure that the hackers will agree any of the above as a
practical solution to the "relation extension" problem.
[1]: /messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 25, 2021 at 1:50 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
+ /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0;I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring
performance gain.
In my test environment, if I change this code to use option "
TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation.+1, probably.
I tried to explain it at [1]/messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com. Please have a look.
Does the code comment represent the situation like this?
1. Worker 1 is inserting into page 1.
2. Worker 2 tries to insert into page 1, but cannot acquire the buffer content lock of page 1 because worker 1 holds it.
3. Worker 2 looks up FSM to find a page with enough free space.
I tried to explain it at [1]/messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com. Please have a look.
But isn't FSM still empty during CTAS?
No, FSM will be built on the fly in case if we don't skip the FSM i.e.
myState->ti_options = 0, see RelationGetBufferForTuple with use_fsm =
true -> GetPageWithFreeSpace -> fsm_search -> fsm_set_and_search ->
fsm_readbuf with extend = true.
[1]: /messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk.How you have ensured that the cost is due to the flushing of pages?
I think I'm wrong to just say the problem is with the flushing of
empty pages when bulk extending the relation. I should have said the
problem is with the "relation extension lock", but I will hold on to
it for a moment until I capture the relation extension lock wait
events for the regression causing cases. I will share the information
soon.
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events.
I will also capture the data file sync events along with relation
extension lock wait events.
The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?
I tried to explain it at [1]/messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com. Please have a look. It looks like the
burden is more on the "relation extension lock" and the way the extra
new blocks are getting added.
[1]: /messages/by-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?I tried to explain it at [1]. Please have a look.
I have read it but I think we should try to ensure practically what is
happening because it is possible that first time worker checked in FSM
without taking relation extension lock, it didn't find any free page,
and then when it tried to acquire the conditional lock, it got the
same and just extended the relation by one block. So, in such a case
it won't be able to use the newly added pages by another worker. I am
not sure any such thing is happening here but I think it is better to
verify it in some way. Also, I am not sure if just getting the info
about the relation extension lock is sufficient?
--
With Regards,
Amit Kapila.
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Wednesday, May 26, 2021 7:22 PM
Thanks for trying that out.
Please see the code around the use_fsm flag in RelationGetBufferForTuple for
more understanding of the points below.What happens if FSM is skipped i.e. myState->ti_options =
TABLE_INSERT_SKIP_FSM;?
1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it until it is full.
When the block is full, the worker doesn't look in FSM GetPageWithFreeSpace
as use_fsm is false. It directly goes for relation extension and tries to acquire
relation extension lock with LockRelationForExtension. Note that the bulk
extension of blocks with RelationAddExtraBlocks is not reached as use_fsm is
false.
3) After acquiring the relation extension lock, it adds an extra new block with
ReadBufferBI(relation, P_NEW, ...), see the comment "In addition to whatever
extension we performed above, we always add at least one block to satisfy our
own request." The tuple is inserted into this new block.Basically, the workers can't look for the empty pages from the pages added by
other workers, they keep doing the above steps in silos.What happens if FSM is not skipped i.e. myState->ti_options = 0;?
1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it until it is full.
When the block is full, the worker looks for the page with free space in FSM
GetPageWithFreeSpace as use_fsm is true.
If it can't find any page with the required amount of free space, it goes for bulk
relation extension(RelationAddExtraBlocks) after acquiring relation extension
lock with ConditionalLockRelationForExtension. Then the worker adds
extraBlocks = Min(512, lockWaiters * 20); new blocks in
RelationAddExtraBlocks and immediately updates the bottom level of FSM for
each block (see the comment around RecordPageWithFreeSpace for why only
the bottom level, not the entire FSM tree). After all the blocks are added, then
it updates the entire FSM tree FreeSpaceMapVacuumRange.
4) After the bulk extension, then the worker adds another block see the
comment "In addition to whatever extension we performed above, we always
add at least one block to satisfy our own request." and inserts tuple into this
new block.Basically, the workers can benefit from the bulk extension of the relation and
they always can look for the empty pages from the pages added by other
workers. There are high chances that the blocks will be available after bulk
extension. Having said that, if the added extra blocks are consumed by the
workers so fast i.e. if the tuple sizes are big i.e very less tuples per page, then,
the bulk extension too can't help much and there will be more contention on
the relation extension lock. Well, one might think to add more blocks at a time,
say Min(1024, lockWaiters * 128/256/512) than currently extraBlocks = Min(512,
lockWaiters * 20);. This will work (i.e. we don't see any regression with parallel
inserts in CTAS patches), but it can't be a practical solution. Because the total
pages for the relation will be more with many pages having more free space.
Furthermore, the future sequential scans on that relation might take a lot of
time.If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the place(within if
(myState->is_parallel)), then it will be effective for leader i.e. leader will not
look for FSM, but all the workers will, because within if
(myState->is_parallel_worker) in intorel_startup,
myState->ti_options = 0; for workers.I ran tests with configuration shown at [1] for the case 4 (2 bigint(of 8 bytes
each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn
tuples) with leader participation where I'm seeing regression:1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader and
workers, then my results are as follows:
0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
2) when myState->ti_options = 0; for both leader and workers, then my results
are as follows:
0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409
I hope the above explanation and the test results should clarify the fact that
skipping FSM doesn't solve the problem. Let me know if anything is not clear or
I'm missing something.
Thanks for the explanation.
I followed your above test steps and the below configuration, but my test results are a little different from yours.
I am not sure the exact reason, maybe because of the hardware..
Test INSERT 10000000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) columns):
SERIAL: 22023.631 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms [SKIP FSM]: 19381.474 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms [SKIP FSM]: 18381.305 ms
I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
bring is not much greater than the cost in FSM.
Do you have some ideas on it ?
My test machine:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2901.005
CPU max MHz: 3200.0000
CPU min MHz: 1000.0000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 14080K
Best regards,
houzj
Show quoted text
[1] postgresql.conf parameters used:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
port = 5440System Configuration:
RAM: 528GB
Disk Type: SSD
Disk Size: 1.5TB
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
CPU max MHz: 2129.0000
CPU min MHz: 1064.0000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
Thank you for the detailed analysis, I'll look into it too. (The times have changed...)
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Well, one might think to add more blocks at a time, say
Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
regression with parallel inserts in CTAS patches), but it can't be a
practical solution. Because the total pages for the relation will be
more with many pages having more free space. Furthermore, the future
sequential scans on that relation might take a lot of time.
Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
is increased a bit from the current 16MB to some other reasonable
value. I earlier tried these experiments.Otherwise, as I said in [1], we can also increase the number of extra
blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
currently extraBlocks = Min(512, lockWaiters * 20);. This will also
give some speedup and we don't see any regression with parallel
inserts in CTAS patches.But, I'm not so sure that the hackers will agree any of the above as a
practical solution to the "relation extension" problem.
I think I understand your concern about resource consumption and impact on other concurrently running jobs (OLTP, data analysis.)
OTOH, what's the situation like when the user wants to run CTAS, and further, wants to speed it up by using parallelism? isn't it okay to let the (parallel) CTAS use as much as it wants? At least, I think we can provide another mode for it, like Oracle provides conditional path mode and direct path mode for INSERT and data loading.
What do we want to do to maximize parallel CTAS speedup if we were a bit unshackled from the current constraints (alignment with existing code, impact on other concurrent workloads)?
* Use as many shared buffers as possible to decrease WAL flush.
Otherwise, INSERT SELECT may be faster?
* Minimize relation extension (= increase the block count per extension)
posix_fallocate() would help too.
* Allocate added pages among parallel workers, and each worker fills pages to their full capacity.
The worker that extended the relation stores the page numbers of added pages in shared memory for parallel execution. Each worker gets a page from there after waiting for the relation extension lock, instead of using FSM.
The last pages that the workers used will be filled halfway, but the amount of unused space should be low compared to the total table size.
Regards
Takayuki Tsunakawa
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
I followed your above test steps and the below configuration, but my test results are a little different from yours.
I am not sure the exact reason, maybe because of the hardware..Test INSERT 10000000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) columns):
SERIAL: 22023.631 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms [SKIP FSM]: 19381.474 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms [SKIP FSM]: 18381.305 ms
I'm not sure why there's a huge difference in the execution time, on
your system it just takes ~20sec whereas on my system(with SSD) it
takes ~115 sec. I hope you didn't try creating the unlogged table in
CTAS right? Just for reference, the exact use case I tried is at [1]case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn tuples DROP TABLE tenk1; CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name); INSERT INTO tenk1 values(generate_series(1,10000000), generate_series(1,10000000), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8))); explain analyze verbose create table test as select * from tenk1;.
The configure command I used to build the postgres source code is at
[2]: ./configure --with-zlib --prefix=$PWD/inst/ --with-openssl --with-readline --with-libxml > war.log && make -j 8 install > war.log 2>&1 &
[1]: case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn tuples DROP TABLE tenk1; CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name); INSERT INTO tenk1 values(generate_series(1,10000000), generate_series(1,10000000), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8))); explain analyze verbose create table test as select * from tenk1;
each) columns, tuple size 1064 bytes, 10mn tuples
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,10000000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;
[2]: ./configure --with-zlib --prefix=$PWD/inst/ --with-openssl --with-readline --with-libxml > war.log && make -j 8 install > war.log 2>&1 &
--with-readline --with-libxml > war.log && make -j 8 install >
war.log 2>&1 &
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, May 26, 2021 at 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?I tried to explain it at [1]. Please have a look.
I have read it but I think we should try to ensure practically what is
happening because it is possible that first time worker checked in FSM
without taking relation extension lock, it didn't find any free page,
and then when it tried to acquire the conditional lock, it got the
same and just extended the relation by one block. So, in such a case
it won't be able to use the newly added pages by another worker. I am
not sure any such thing is happening here but I think it is better to
verify it in some way. Also, I am not sure if just getting the info
about the relation extension lock is sufficient?
One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.
--
With Regards,
Amit Kapila.
On Thu, May 27, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have read it but I think we should try to ensure practically what is
happening because it is possible that first time worker checked in FSM
without taking relation extension lock, it didn't find any free page,
and then when it tried to acquire the conditional lock, it got the
same and just extended the relation by one block. So, in such a case
it won't be able to use the newly added pages by another worker. I am
not sure any such thing is happening here but I think it is better to
verify it in some way. Also, I am not sure if just getting the info
about the relation extension lock is sufficient?One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.
Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.
My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
bring is not much greater than the cost in FSM.
Do you have some ideas on it ?
I think, if we try what Amit and I said in [1]/messages/by-id/CALj2ACXskhY58=Fh8TioKLL1DXYkKdyEyWFYykf-6aLJgJ2qmQ@mail.gmail.com, we should get some
insights on whether the bulk relation extension is taking more time or
the FSM lookup. I plan to share the testing patch adding the timings
and the counters so that you can also test from your end. I hope
that's fine with you.
[1]: /messages/by-id/CALj2ACXskhY58=Fh8TioKLL1DXYkKdyEyWFYykf-6aLJgJ2qmQ@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
bring is not much greater than the cost in FSM.
Do you have some ideas on it ?I think, if we try what Amit and I said in [1], we should get some
insights on whether the bulk relation extension is taking more time or
the FSM lookup. I plan to share the testing patch adding the timings
and the counters so that you can also test from your end. I hope
that's fine with you.
I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in. Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.
2. I think the parallel worker are scanning are producing a lot of
tuple in a short time so the demand for the new block is very high
compare to what AddExtra block is able to produce, so maybe you can
try adding more block by increasing the multiplier and see what is the
impact.
3. Also try where the underlying select query has some complex
condition and also it select fewer record say 50%, 40%...10% and see
what are the numbers.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From: Dilip Kumar <dilipbalaut@gmail.com>
I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in. Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.
Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM) saw that it's effective. I think we should remove this shackle from CTAS.
The question is why CTAS chose to use BULKWRITE strategy in the past. We need to know that to make a better decision. I can understand why VACUUM uses a ring buffer, because it should want to act humbly as a background maintenance task to not cause trouble to frontend tasks. But why does CTAS have to be humble? If CTAS needs to be modest, why doesn't it use the BULKREAD strategy for its SELECT?
Regards
Takayuki Tsunakawa
On Thu, 27 May 2021 at 11:32 AM, tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:
From: Dilip Kumar <dilipbalaut@gmail.com>
I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in. Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM)
saw that it's effective. I think we should remove this shackle from CTAS.The question is why CTAS chose to use BULKWRITE strategy in the past. We
need to know that to make a better decision.
Basically you are creating a new table and loading data to it and that
means you will be less likely to access those data soon so for such thing
spoiling buffer cache may not be a good idea. I was just suggesting only
for experiments for identifying the root cause.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 12:46 PM
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:I am afraid that the using the FSM seems not get a stable performance
gain(at least on my machine), I will take a deep look into this to
figure out the difference. A naive idea it that the benefit that bulk extensionbring is not much greater than the cost in FSM.
Do you have some ideas on it ?
I think, if we try what Amit and I said in [1], we should get some insights on
whether the bulk relation extension is taking more time or the FSM lookup. I
plan to share the testing patch adding the timings and the counters so that you
can also test from your end. I hope that's fine with you.
Sure, it will be nice if we can calculate the exact time. Thanks in advance.
BTW, I checked my test results, I was testing INSERT INTO unlogged table.
I re-test INSERT into normal(logged) table again, it seems [SKIP FSM] still Looks slightly better.
Although, the 4 workers case still has performance degradation compared to serial case.
SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms [SKIP FSM]: 66,960.305 ms
Best regards,
houzj
On Thu, May 27, 2021 at 12:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
BTW, I checked my test results, I was testing INSERT INTO unlogged table.
What do you mean by "testing INSERT INTO"? Is it that you are testing
the timings for parallel inserts in INSERT INTO ... SELECT command? If
so, why should we test parallel inserts in the INSERT INTO ... SELECT
command here?
The way I test parallel inserts in CTAS is: Apply the latest v23 patch
set available at [1]/messages/by-id/CALj2ACXVWr1o+FZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA@mail.gmail.com. Run the data preparation sqls from [2]DROP TABLE tenk1; CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name); INSERT INTO tenk1 values(generate_series(1,100000), generate_series(1,10000000), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)));. Enable
timing and run the CTAS query from [3]EXPLAIN ANALYZE VERBOSE CREATE TABLE test AS SELECT * FROM tenk1;. Run with 0, 2 and 4 workers
with leader participation on.
[1]: /messages/by-id/CALj2ACXVWr1o+FZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA@mail.gmail.com
[2]: DROP TABLE tenk1; CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name); INSERT INTO tenk1 values(generate_series(1,100000), generate_series(1,10000000), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)), upper(substring(md5(random()::varchar),2,8)));
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,100000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
[3]: EXPLAIN ANALYZE VERBOSE CREATE TABLE test AS SELECT * FROM tenk1;
EXPLAIN ANALYZE VERBOSE CREATE TABLE test AS SELECT * FROM tenk1;
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 2:59 PM
On Thu, May 27, 2021 at 12:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:BTW, I checked my test results, I was testing INSERT INTO unlogged table.
What do you mean by "testing INSERT INTO"? Is it that you are testing the
timings for parallel inserts in INSERT INTO ... SELECT command? If so, why
should we test parallel inserts in the INSERT INTO ... SELECT command here?
Oops, sorry, it's a typo, I actually meant CREATE TABLE AS SELECT.
Best regards,
houzj
From: Dilip Kumar <dilipbalaut@gmail.com>
Basically you are creating a new table and loading data to it and that means you will be less likely to access those data soon so for such thing spoiling buffer cache may not be a good idea.
--------------------------------------------------
Some people, including me, would say that the table will be accessed soon and that's why the data is loaded quickly during minimal maintenance hours.
--------------------------------------------------
I was just suggesting only for experiments for identifying the root cause.
--------------------------------------------------
I thought this is a good chance to possibly change things better (^^).
I guess the user would simply think like this: "I just want to finish CTAS as quickly as possible, so I configured to take advantage of parallelism. I want CTAS to make most use of our resources. Why doesn't Postgres try to limit resource usage (by using the ring buffer) against my will?"
Regards
Takayuki Tsunakawa
On Thu, May 27, 2021 at 12:46 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: Dilip Kumar <dilipbalaut@gmail.com>
Basically you are creating a new table and loading data to it and that means you will be less likely to access those data soon so for such thing spoiling buffer cache may not be a good idea.
--------------------------------------------------Some people, including me, would say that the table will be accessed soon and that's why the data is loaded quickly during minimal maintenance hours.
--------------------------------------------------
I was just suggesting only for experiments for identifying the root cause.
--------------------------------------------------I thought this is a good chance to possibly change things better (^^).
I guess the user would simply think like this: "I just want to finish CTAS as quickly as possible, so I configured to take advantage of parallelism. I want CTAS to make most use of our resources. Why doesn't Postgres try to limit resource usage (by using the ring buffer) against my will?"
If the idea is to give the user control of whether or not to use the
separate RING BUFFER for bulk inserts/writes, then how about giving it
as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
and COPY. Furthermore, we could make the rel option an integer and
allow users to provide the size of the ring buffer they want to choose
for a particular bulk insert operation (of course with a max limit
which is not exceeding the shared buffers or some reasonable amount
not exceeding the RAM of the system).
I think we can discuss this in a separate thread and see what other
hackers think.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
Although, the 4 workers case still has performance degradation compared to
serial case.SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms [SKIP FSM]:
58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms [SKIP FSM]:
66,960.305 ms
Can you see any difference in table sizes?
Regards
Takayuki Tsunakawa
On Thu, May 27, 2021 at 1:03 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
Although, the 4 workers case still has performance degradation compared to
serial case.SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms [SKIP FSM]:
58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms [SKIP FSM]:
66,960.305 msCan you see any difference in table sizes?
Also, the number of pages the table occupies in each case along with
table size would give more insights.
I do as follows to get the number of pages a relation occupies:
CREATE EXTENSION pgstattuple;
SELECT pg_relpages('test');
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I think we can discuss this in a separate thread and see what other
hackers think.
OK, unless we won't get stuck in the current direction. (Our goal is to not degrade in performance, but to outperform serial execution, isn't it?)
If the idea is to give the user control of whether or not to use the
separate RING BUFFER for bulk inserts/writes, then how about giving it
as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
and COPY. Furthermore, we could make the rel option an integer and
allow users to provide the size of the ring buffer they want to choose
for a particular bulk insert operation (of course with a max limit
which is not exceeding the shared buffers or some reasonable amount
not exceeding the RAM of the system).
I think it's not a table property but an execution property. So, it'd be appropriate to control it with the SET command, just like the DBA sets work_mem and maintenance_work_mem for specific maintenance operations.
I'll stop on this here...
Regards
Takayuki Tsunakawa
On Thu, May 27, 2021 at 10:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
bring is not much greater than the cost in FSM.
Do you have some ideas on it ?I think, if we try what Amit and I said in [1], we should get some
insights on whether the bulk relation extension is taking more time or
the FSM lookup. I plan to share the testing patch adding the timings
and the counters so that you can also test from your end. I hope
that's fine with you.I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in. Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.
One more thing to ensure is whether all the workers are using the same
access strategy?
--
With Regards,
Amit Kapila.
On Thu, May 27, 2021 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in. Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.One more thing to ensure is whether all the workers are using the same
access strategy?
In the Parallel Inserts in CTAS patches, the leader and each worker
uses its own ring buffer of 16MB i.e. does myState->bistate =
GetBulkInsertState(); separately.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 3:41 PM
On Thu, May 27, 2021 at 1:03 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
Although, the 4 workers case still has performance degradation
compared to serial case.SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms [SKIP FSM]:
58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms [SKIP FSM]:
66,960.305 msCan you see any difference in table sizes?
Also, the number of pages the table occupies in each case along with table size
would give more insights.I do as follows to get the number of pages a relation occupies:
CREATE EXTENSION pgstattuple;
SELECT pg_relpages('test');
It seems the difference between SKIP FSM and NOT SKIP FSM is not big.
I tried serval times and the average result is almost the same.
pg_relpages
-------------
1428575
pg_relation_size
-------------
11702976512(11G)
Best regards,
houzj
On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.
I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23
patch set. Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension
Total time spent in GetPageWithFreeSpace
Total time spent in RelationAddExtraBlocks
Total number of times extended the relation in bulk
Total number of times extended the relation by one block
Total number of blocks added in bulk extension
Total number of times getting the page from FSM
Here is a summary of what I observed:
1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
(140 sec) is more than with 0 workers (112 sec)
2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
sec)
3) Majority of the time is going into waiting for relation extension
lock in LockRelationForExtension. With 2 workers, without
TABLE_INSERT_SKIP_FSM, out of total execution time 140 sec, the time
spent in LockRelationForExtension is ~40 sec and the time spent in
RelationAddExtraBlocks is ~20 sec. So, ~60 sec are being spent in
these two functions. With 2 workers, with TABLE_INSERT_SKIP_FSM, out
of total execution time 225 sec, the time spent in
LockRelationForExtension is ~135 sec and the time spent in
RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend
logic applies). So, most of the time is being spent in
LockRelationForExtension.
I'm still not sure why the execution time with 0 workers (or serial
execution or no parallelism involved) on my testing system is 112 sec
compared to 58 sec on Hou-San's system for the same use case. Maybe
the testing system I'm using is not of the latest configuration
compared to others.
Having said that, I request others to try and see if the same
observations (as above) are made on their testing systems for the same
use case. If others don't see regression (with just 2 workers) or they
observe not much difference with and without TABLE_INSERT_SKIP_FSM.
I'm open to changing the parallel inserts in CTAS code to use
TABLE_INSERT_SKIP_FSM. In any case, if the observation is that there's
a good amount of time being spent in LockRelationForExtension, I'm not
sure (at this point) whether we can do something here or just live
with it.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-test-times-and-block-counts.patchapplication/x-patch; name=0001-test-times-and-block-counts.patchDownload
From 105495d5aa6b25f61c812dd5cb9ff692abe38373 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 27 May 2021 13:37:32 +0530
Subject: [PATCH] test times and block counts
---
src/backend/access/heap/hio.c | 21 ++++++++++++
src/backend/commands/createas.c | 41 +++++++++++++++++++++++
src/backend/storage/freespace/freespace.c | 22 +++++++++++-
src/backend/storage/lmgr/lmgr.c | 13 +++++++
src/include/commands/createas.h | 17 ++++++++++
5 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index d34edb4190..d737c2d763 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -19,6 +19,8 @@
#include "access/hio.h"
#include "access/htup_details.h"
#include "access/visibilitymap.h"
+#include "commands/createas.h"
+#include "portability/instr_time.h"
#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/lmgr.h"
@@ -198,12 +200,17 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
firstBlock = InvalidBlockNumber;
int extraBlocks;
int lockWaiters;
+ instr_time start;
+ instr_time end;
/* Use the length of the lock wait queue to judge how much to extend. */
lockWaiters = RelationExtensionLockWaiterCount(relation);
if (lockWaiters <= 0)
return;
+ if (is_ctas)
+ INSTR_TIME_SET_CURRENT(start);
+
/*
* It might seem like multiplying the number of lock waiters by as much as
* 20 is too aggressive, but benchmarking revealed that smaller numbers
@@ -212,6 +219,9 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
*/
extraBlocks = Min(512, lockWaiters * 20);
+ if (is_ctas)
+ bulk_rel_extension_blocks_count += extraBlocks;
+
do
{
Buffer buffer;
@@ -267,6 +277,14 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
* just inserted.
*/
FreeSpaceMapVacuumRange(relation, firstBlock, blockNum + 1);
+
+ if (is_ctas)
+ {
+ INSTR_TIME_SET_CURRENT(end);
+ INSTR_TIME_SUBTRACT(end, start);
+ rel_add_extra_blocks_time += INSTR_TIME_GET_MICROSEC(end);
+ bulk_rel_extension_count++;
+ }
}
/*
@@ -630,6 +648,9 @@ loop:
*/
page = BufferGetPage(buffer);
+ if (is_ctas)
+ single_block_rel_extension_count++;
+
if (!PageIsNew(page))
elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
BufferGetBlockNumber(buffer),
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 53ca3010c6..df23a72335 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -62,6 +62,14 @@ static bool intorel_receive(TupleTableSlot *slot, DestReceiver *self);
static void intorel_shutdown(DestReceiver *self);
static void intorel_destroy(DestReceiver *self);
+bool is_ctas = false;
+uint64 lock_rel_extension_time = 0;
+uint64 get_page_from_fsm_time = 0;
+uint64 rel_add_extra_blocks_time = 0;
+uint64 bulk_rel_extension_count = 0;
+uint64 single_block_rel_extension_count = 0;
+uint64 fsm_hit_count = 0;
+uint64 bulk_rel_extension_blocks_count = 0;
/*
* create_ctas_internal
@@ -482,6 +490,14 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
ListCell *lc;
int attnum;
+ lock_rel_extension_time = 0;
+ get_page_from_fsm_time = 0;
+ rel_add_extra_blocks_time = 0;
+ bulk_rel_extension_count = 0;
+ single_block_rel_extension_count = 0;
+ fsm_hit_count = 0;
+ bulk_rel_extension_blocks_count = 0;
+
/*
* All the necessary work such as table creation, sanity checks etc. would
* have been done by the leader. So, parallel workers just need to open the
@@ -652,6 +668,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
/* Nothing to insert if WITH NO DATA is specified. */
if (!myState->into->skipData)
{
+ is_ctas = true;
/*
* Note that the input slot might not be of the type of the target
* relation. That's supported by table_tuple_insert(), but slightly
@@ -665,6 +682,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
myState->output_cid,
myState->ti_options,
myState->bistate);
+ is_ctas = false;
}
/* We know this is a newly created relation, so there are no indexes */
@@ -690,6 +708,29 @@ intorel_shutdown(DestReceiver *self)
/* close rel, but keep lock until commit */
table_close(myState->rel, NoLock);
myState->rel = NULL;
+
+ ereport(LOG, (errmsg("Total time spent in LockRelationForExtension is %f msec",
+ (((double) lock_rel_extension_time) / 1000.0)), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total time spent in GetPageWithFreeSpace is %f msec",
+ (((double) get_page_from_fsm_time) / 1000.0)), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total time spent in RelationAddExtraBlocks is %f msec",
+ (((double) rel_add_extra_blocks_time) / 1000.0)), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total number of times extended the relation in bulk is %lu",
+ bulk_rel_extension_count), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total number of blocks added in bulk extension is %lu",
+ bulk_rel_extension_blocks_count), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total number of times extended the relation by one block is %lu",
+ single_block_rel_extension_count), errhidestmt(true)));
+ ereport(LOG, (errmsg("Total number of times getting the page from FSM is %lu",
+ fsm_hit_count), errhidestmt(true)));
+
+ lock_rel_extension_time = 0;
+ get_page_from_fsm_time = 0;
+ rel_add_extra_blocks_time = 0;
+ bulk_rel_extension_count = 0;
+ single_block_rel_extension_count = 0;
+ fsm_hit_count = 0;
+ bulk_rel_extension_blocks_count = 0;
}
/*
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 8c12dda238..518170a790 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -25,7 +25,9 @@
#include "access/htup_details.h"
#include "access/xlogutils.h"
+#include "commands/createas.h"
#include "miscadmin.h"
+#include "portability/instr_time.h"
#include "storage/freespace.h"
#include "storage/fsm_internals.h"
#include "storage/lmgr.h"
@@ -132,8 +134,26 @@ BlockNumber
GetPageWithFreeSpace(Relation rel, Size spaceNeeded)
{
uint8 min_cat = fsm_space_needed_to_cat(spaceNeeded);
+ BlockNumber blk_no;
+ instr_time start;
+ instr_time end;
- return fsm_search(rel, min_cat);
+ if (is_ctas)
+ INSTR_TIME_SET_CURRENT(start);
+
+ blk_no = fsm_search(rel, min_cat);
+
+ if (is_ctas)
+ {
+ INSTR_TIME_SET_CURRENT(end);
+ INSTR_TIME_SUBTRACT(end, start);
+ get_page_from_fsm_time += INSTR_TIME_GET_MICROSEC(end);
+
+ if (blk_no != InvalidBlockNumber)
+ fsm_hit_count++;
+ }
+
+ return blk_no;
}
/*
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index cdf2266d6d..bed87898ae 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -19,6 +19,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/catalog.h"
+#include "commands/createas.h"
#include "commands/progress.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -403,12 +404,24 @@ void
LockRelationForExtension(Relation relation, LOCKMODE lockmode)
{
LOCKTAG tag;
+ instr_time start;
+ instr_time end;
+
+ if (is_ctas)
+ INSTR_TIME_SET_CURRENT(start);
SET_LOCKTAG_RELATION_EXTEND(tag,
relation->rd_lockInfo.lockRelId.dbId,
relation->rd_lockInfo.lockRelId.relId);
(void) LockAcquire(&tag, lockmode, false, false);
+
+ if (is_ctas)
+ {
+ INSTR_TIME_SET_CURRENT(end);
+ INSTR_TIME_SUBTRACT(end, start);
+ lock_rel_extension_time += INSTR_TIME_GET_MICROSEC(end);
+ }
}
/*
diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h
index 74022aab41..6f998c6146 100644
--- a/src/include/commands/createas.h
+++ b/src/include/commands/createas.h
@@ -21,6 +21,23 @@
#include "tcop/dest.h"
#include "utils/queryenvironment.h"
+extern bool is_ctas;
+/* time spent in LockRelationForExtension */
+extern uint64 lock_rel_extension_time;
+/* time spent in GetPageWithFreeSpace */
+extern uint64 get_page_from_fsm_time;
+ /* time spent in RelationAddExtraBlocks */
+extern uint64 rel_add_extra_blocks_time;
+/* number of times each worker extended the relation in bulk */
+extern uint64 bulk_rel_extension_count;
+/* number of times each worker extended the relation by one block */
+extern uint64 single_block_rel_extension_count;
+/* number of times each worker gets the page from FSM */
+extern uint64 fsm_hit_count;
+/* number of blocks added in bulk extension */
+extern uint64 bulk_rel_extension_blocks_count;
+
+
typedef struct
{
DestReceiver pub; /* publicly-known function pointers */
--
2.25.1
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I'm still not sure why the execution time with 0 workers (or serial execution or
no parallelism involved) on my testing system is 112 sec compared to 58 sec on
Hou-San's system for the same use case. Maybe the testing system I'm using
is not of the latest configuration compared to others.
What's the setting of wal_level on your two's systems? I thought it could be that you set it to > minimal, while Hou-san set it to minimal. (I forgot the results of 2 and 4 workers, though.)
Regards
Takayuki Tsunakawa
From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Sent: Friday, May 28, 2021 8:55 AM
To: 'Bharath Rupireddy' <bharath.rupireddyforpostgres@gmail.com>; Hou,
Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>; Tang, Haiying/唐 海英
<tanghy.fnst@fujitsu.com>; PostgreSQL-development
<pgsql-hackers@postgresql.org>; Zhihong Yu <zyu@yugabyte.com>; Luc
Vlaming <luc@swarm64.com>; Dilip Kumar <dilipbalaut@gmail.com>;
vignesh C <vignesh21@gmail.com>
Subject: RE: Parallel Inserts in CREATE TABLE ASFrom: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I'm still not sure why the execution time with 0 workers (or serial
execution or no parallelism involved) on my testing system is 112 sec
compared to 58 sec on Hou-San's system for the same use case. Maybe
the testing system I'm using is not of the latest configuration compared toothers.
What's the setting of wal_level on your two's systems? I thought it could be
that you set it to > minimal, while Hou-san set it to minimal. (I forgot the
results of 2 and 4 workers, though.)
I think I followed the configuration that Bharath-san mentioned.
It could be the hardware's difference, because I am not using SSD.
I will try to test on SSD to see if there is some difference.
I only change the the following configuration:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
Best regards,
houzj
On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23
patch set. Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension
Total time spent in GetPageWithFreeSpace
Total time spent in RelationAddExtraBlocks
Total number of times extended the relation in bulk
Total number of times extended the relation by one block
Total number of blocks added in bulk extension
Total number of times getting the page from FSM
In your results, the number of pages each process is getting from FSM
is not matching with the number of blocks added. I think we need to
increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
because that is also called and the process can get a free page via
the same. The other thing to check via debugger is when one worker
adds the blocks in bulk does another parallel worker gets all those
blocks. You can achieve that by allowing one worker (say worker-1) to
extend the relation in bulk and then let it wait and allow another
worker (say worker-2) to proceed and see if it gets all the pages
added by worker-1 from FSM. You need to keep the leader also waiting
or not perform any operation.
--
With Regards,
Amit Kapila.
On Fri, May 28, 2021 at 6:24 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
I'm still not sure why the execution time with 0 workers (or serial execution or
no parallelism involved) on my testing system is 112 sec compared to 58 sec on
Hou-San's system for the same use case. Maybe the testing system I'm using
is not of the latest configuration compared to others.What's the setting of wal_level on your two's systems? I thought it could be that you set it to > minimal, while Hou-san set it to minimal. (I forgot the results of 2 and 4 workers, though.)
Thanks. I was earlier running with default wal_level = replica.
Results on my system, with wal_level = minimal, PSA file
"test_results2" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61875.255 ms (01:01.875)
2 workers - Time: 89227.379 ms (01:29.227)
4 workers - Time: 81484.876 ms (01:21.485)
With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61279.764 ms (01:01.280)
2 workers - Time: 208620.453 ms (03:28.620)
4 workers - Time: 223737.081 ms (03:43.737)
Results on my system, with wal_level = replica, PSA file
"test_results1" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112175.273 ms (01:52.175)
2 workers - Time: 140441.158 ms (02:20.441)
4 workers - Time: 141750.577 ms (02:21.751)
With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112637.906 ms (01:52.638)
2 workers - Time: 225358.287 ms (03:45.358)
4 workers - Time: 242172.600 ms (04:02.173)
Results on Hou-san's system:
SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms [SKIP FSM]: 66,960.305 ms
Majority of the time is being spent in LockRelationForExtension,
RelationAddExtraBlocks without TABLE_INSERT_SKIP_FSM and in
LockRelationForExtension with TABLE_INSERT_SKIP_FSM. The observations
made at [1]/messages/by-id/CALj2ACV-VToW65BE6ndDEB7S_3qhzQ_BUWtw2q6V88iwTwwPSg@mail.gmail.com still hold true with wal_level = minimal.
I request Hou-san to capture the same info with the add-on patch
shared earlier. This would help us to be on the same page. We can
further think on:
1) Why so much time is being spent in LockRelationForExtension?
2) Whether to use TABLE_INSERT_SKIP_FSM or not, in other words,
whether to take advantage of bulk relation extension or not.
3) If bulk relation extension is to be used i.e. without
TABLE_INSERT_SKIP_FSM flag, then whether the blocks being added by one
worker are immediately visible to other workers or not after it
finishes adding all the blocks.
[1]: /messages/by-id/CALj2ACV-VToW65BE6ndDEB7S_3qhzQ_BUWtw2q6V88iwTwwPSg@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 10:07 PM
On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended
the relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23 patch set.
Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension Total time spent in
GetPageWithFreeSpace Total time spent in RelationAddExtraBlocks Total
number of times extended the relation in bulk Total number of times extended
the relation by one block Total number of blocks added in bulk extension Total
number of times getting the page from FSMHere is a summary of what I observed:
1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
(140 sec) is more than with 0 workers (112 sec)
2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
sec)
3) Majority of the time is going into waiting for relation extension lock in
LockRelationForExtension. With 2 workers, without TABLE_INSERT_SKIP_FSM,
out of total execution time 140 sec, the time spent in LockRelationForExtension
is ~40 sec and the time spent in RelationAddExtraBlocks is ~20 sec. So, ~60 sec
are being spent in these two functions. With 2 workers, with
TABLE_INSERT_SKIP_FSM, out of total execution time 225 sec, the time spent
in LockRelationForExtension is ~135 sec and the time spent in
RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend logic
applies). So, most of the time is being spent in LockRelationForExtension.I'm still not sure why the execution time with 0 workers (or serial execution or
no parallelism involved) on my testing system is 112 sec compared to 58 sec on
Hou-San's system for the same use case. Maybe the testing system I'm using is
not of the latest configuration compared to others.Having said that, I request others to try and see if the same observations (as
above) are made on their testing systems for the same use case. If others don't
see regression (with just 2 workers) or they observe not much difference with
and without TABLE_INSERT_SKIP_FSM.
Thanks for the patch !
I attached my test results. Note I did not change the wal_level to minimal.
I only change the the following configuration:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
Best regards,
houzj
Attachments:
On Fri, May 28, 2021 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23
patch set. Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension
Total time spent in GetPageWithFreeSpace
Total time spent in RelationAddExtraBlocks
Total number of times extended the relation in bulk
Total number of times extended the relation by one block
Total number of blocks added in bulk extension
Total number of times getting the page from FSMIn your results, the number of pages each process is getting from FSM
is not matching with the number of blocks added. I think we need to
increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
because that is also called and the process can get a free page via
the same. The other thing to check via debugger is when one worker
adds the blocks in bulk does another parallel worker gets all those
blocks. You can achieve that by allowing one worker (say worker-1) to
extend the relation in bulk and then let it wait and allow another
worker (say worker-2) to proceed and see if it gets all the pages
added by worker-1 from FSM. You need to keep the leader also waiting
or not perform any operation.
While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?
--
With Regards,
Amit Kapila.
On Sat, May 29, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?
Hi,
I measured the execution timings for parallel inserts in CTAS in cases
where the planner chooses parallelism for selects naturally. This
means, I have used only 0001 patch from v23 patch set at [1]/messages/by-id/CALj2ACXVWr1o+FZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA@mail.gmail.com. I have
not used the 0002 patch that makes parallel_tuple_cost 0.
Query used for all these tests is below. Also, attached table creation
sqls in the file "test_cases".
EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1
t1, tenk2 t2 where t1.c1 = t2.d2;
All the results are of the form (number of workers, exec time in milli sec).
Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4
bytes) columns, tuple size 28 bytes, 100mn tuples
master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms)
with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448
ms), (4, 111533.731 ms)
Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4
bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples
master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms)
with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198
ms), (4, 203543.681 ms)
Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8),
tuple size 241 bytes, 100mn tuples
master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms)
with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322
ms), (4, 680793.305 ms)
Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 16 name(of 64 bytes each) columns, tuple size
1064 bytes, 10mn tuples
master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms)
with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472
ms), (4, 317895.558 ms)
Observation: parallel insert + parallel select gives good benefit wIth
very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger
serial insert + parallel select fares better, cases 3 and 4.
In the coming days, I will try to work on more performance analysis
and clarify some of the points raised upthread.
[1]: /messages/by-id/CALj2ACXVWr1o+FZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA@mail.gmail.com
[2]: postgresql.conf changes I made: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = on checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off wal_level = replica
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = on
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
wal_level = replica
With Regards,
Bharath Rupireddy.