pgsql: Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
expression or support function
The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.
While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.
Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.
Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: /messages/by-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: /messages/by-id/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/05c8482f7f69a954fd65fce85f896e848fc48197
Modified Files
--------------
doc/src/sgml/parallel.sgml | 4 +-
src/backend/access/transam/xact.c | 26 ++
src/backend/executor/execMain.c | 3 +
src/backend/nodes/copyfuncs.c | 1 +
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 37 +-
src/backend/optimizer/util/clauses.c | 550 +++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 33 +-
src/include/access/xact.h | 15 +
src/include/nodes/pathnodes.h | 2 +
src/include/nodes/plannodes.h | 2 +
src/include/optimizer/clauses.h | 3 +-
src/test/regress/expected/insert_parallel.out | 536 +++++++++++++++++++++++++
src/test/regress/parallel_schedule | 1 +
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/insert_parallel.sql | 335 ++++++++++++++++
17 files changed, 1531 insertions(+), 21 deletions(-)
Amit Kapila <akapila@postgresql.org> writes:
Enable parallel SELECT for "INSERT INTO ... SELECT ...".
skink (valgrind) is unhappy:
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ==4085668== VALGRINDERROR-BEGIN
==4085668== Conditional jump or move depends on uninitialised value(s)
==4085668== at 0x4AEB77: max_parallel_hazard_walker (clauses.c:700)
==4085668== by 0x445287: expression_tree_walker (nodeFuncs.c:2188)
==4085668== by 0x4AEBB8: max_parallel_hazard_walker (clauses.c:860)
==4085668== by 0x4B045E: is_parallel_safe (clauses.c:637)
==4085668== by 0x4985D0: grouping_planner (planner.c:2070)
==4085668== by 0x49AE4F: subquery_planner (planner.c:1024)
==4085668== by 0x49B4F5: standard_planner (planner.c:404)
==4085668== by 0x49BAD2: planner (planner.c:273)
==4085668== by 0x5818BE: pg_plan_query (postgres.c:809)
==4085668== by 0x581977: pg_plan_queries (postgres.c:900)
==4085668== by 0x581E70: exec_simple_query (postgres.c:1092)
==4085668== by 0x583F7A: PostgresMain (postgres.c:4327)
==4085668== Uninitialised value was created by a stack allocation
==4085668== at 0x4B0363: is_parallel_safe (clauses.c:599)
==4085668==
==4085668== VALGRINDERROR-END
There are a few other commits that skink hasn't seen before, but given
the apparent connection to parallel planning, none of the others look
like plausible candidates to explain this.
regards, tom lane
On Wed, Mar 10, 2021 at 9:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <akapila@postgresql.org> writes:
Enable parallel SELECT for "INSERT INTO ... SELECT ...".
skink (valgrind) is unhappy:
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ==4085668== VALGRINDERROR-BEGIN
==4085668== Conditional jump or move depends on uninitialised value(s)
==4085668== at 0x4AEB77: max_parallel_hazard_walker (clauses.c:700)
==4085668== by 0x445287: expression_tree_walker (nodeFuncs.c:2188)
==4085668== by 0x4AEBB8: max_parallel_hazard_walker (clauses.c:860)
==4085668== by 0x4B045E: is_parallel_safe (clauses.c:637)
==4085668== by 0x4985D0: grouping_planner (planner.c:2070)
==4085668== by 0x49AE4F: subquery_planner (planner.c:1024)
==4085668== by 0x49B4F5: standard_planner (planner.c:404)
==4085668== by 0x49BAD2: planner (planner.c:273)
==4085668== by 0x5818BE: pg_plan_query (postgres.c:809)
==4085668== by 0x581977: pg_plan_queries (postgres.c:900)
==4085668== by 0x581E70: exec_simple_query (postgres.c:1092)
==4085668== by 0x583F7A: PostgresMain (postgres.c:4327)
==4085668== Uninitialised value was created by a stack allocation
==4085668== at 0x4B0363: is_parallel_safe (clauses.c:599)
==4085668==
==4085668== VALGRINDERROR-ENDThere are a few other commits that skink hasn't seen before, but given
the apparent connection to parallel planning, none of the others look
like plausible candidates to explain this.
Right, the patch forgot to initialize a new variable in
max_parallel_hazard_context via is_parallel_safe. I think we need to
initialize all the new variables as NULL because is_parallel_safe is
used to check parallel-safety of expressions. These new variables are
only required for checking parallel-safety of target relation which is
already done at the time of initial checks in standard_planner.
--
With Regards,
Amit Kapila.