Parallel INSERT SELECT take 2

Started by tsunakawa.takay@fujitsu.comabout 5 years ago41 messageshackers
Jump to latest
#1tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com

This is another try of [1]Parallel INSERT (INTO ... SELECT ...) /messages/by-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com.

BACKGROUND
========================================

We want to realize parallel INSERT SELECT in the following steps:
1) INSERT + parallel SELECT
2) Parallel INSERT + parallel SELECT

Below are example use cases. We don't expect high concurrency or an empty data source.
* Data loading (ETL or ELT) into an analytics database, typically a data ware house.
* Batch processing in an OLTP database.

PROBLEMS
========================================

(1) The overhead of checking parallel-safety could be large
We have to check the target table and its child partitions for parallel safety. That is, we make sure those relations don't have parallel-unsafe domains, constraints, indexes, or triggers.

What we should check is the relations into which the statement actually inserts data. However, the planner does not know which relations will be actually inserted into. So, the planner has to check all descendant partitions of a target table. When the target table has many partitions, this overhead could be unacceptable when compared to the benefit gained from parallelism.

(2) There's no mechanism for parallel workers to assign an XID
Parallel workers need an XID of the current (sub)transaction when actually inserting a tuple (i.e., calling heap_insert()). When the leader has not got the XID yet, the worker may have to assign a new XID and communicate it to the leader and other workers so that all parallel processes use the same XID.

SOLUTION TO (1)
========================================

The candidate ideas are:

1) Caching the result of parallel-safety check
The planner stores the result of checking parallel safety for each relation in relcache, or some purpose-built hash table in shared memory.

The problems are:

* Even if the target relation turns out to be parallel safe by looking at those data structures, we cannot assume it remains true until the SQL statement finishes. For instance, other sessions might add a parallel-unsafe index to its descendant relations. Other examples include that when the user changes the parallel safety of indexes or triggers by running ALTER FUNCTION on the underlying index AM function or trigger function, the relcache entry of the table or index is not invalidated, so the correct parallel safety is not maintained in the cache.
In that case, when the executor encounters a parallel-unsafe object, it can change the cached state as being parallel-unsafe and error out.

* Can't ensure fast access. With relcache, the first access in each session has to undergo the overhead of parallel-safety check. With a hash table in shared memory, the number of relations stored there would be limited, so the first access after database startup or the hash table entry eviction similarly experiences slowness.

* With a new hash table, some lwlock for concurrent access must be added, which can have an adverse effect on performance.

2) Enabling users to declare that the table allows parallel data modification
Add a table property that represents parallel safety of the table for DML statement execution. Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.

The planner assumes that all of the table, its descendant partitions, and their ancillary objects have the specified parallel safety or safer one. The user is responsible for its correctness. If the parallel processes find an object that is less safer than the assumed parallel safety during statement execution, it throws an ERROR and abort the statement execution.

The objects that relate to the parallel safety of a DML target table are as follows:

* Column default expression
* DOMAIN type CHECK expression
* CHECK constraints on column
* Partition key
* Partition key support function
* Index expression
* Index predicate
* Index AM function
* Operator function
* Trigger function

When the parallel safety of some of these objects is changed, it's costly to reflect it on the parallel safety of tables that depend on them. So, we don't do it. Instead, we provide a utility function pg_get_parallel_safety('table_name') that returns records of (objid, classid, parallel_safety) that represent the parallel safety of objects that determine the parallel safety of the specified table. The function only outputs objects that are not parallel safe. Otherwise, it will consume excessive memory while accumulating the output. The user can use this function to identify problematic objects when a parallel DML fails or is not parallelized in an expected manner.

How does the executor detect parallel unsafe objects? There are two ways:

1) At loading time
When the executor loads the definition of objects (tables, constraints, index, triggers, etc.) during the first access to them after session start or their eviction by sinval message, it checks the parallel safety.

This is a legitimate way, but may need much code. Also, it might overlook necessary code changes without careful inspection.

2) At function execution time
All related objects come down to some function execution. So, add a parallel safety check there when in a parallel worker. If the current process is a parallel worker and the function is parallel unsafe, error out with ereport(ERROR). This approach eliminates the oversight of parallel safety check with the additional bonus of tiny code change!

The place would be FunctionCallInvoke(). It's a macro in fmgr.h now. Perhaps we should make it a function in fmgr.c, so that fmgr.h does not have to include header files for parallelism-related definitions.

We have to evaluate the performance effect of converting FunctionCallInvoke() into a function and adding an if statement there, because it's a relatively low-level function.

SOLUTION TO (2)
========================================

1) Make it possible for workers to assign an XID and share it among the parallel processes
The problems are:

* Tuple visibility
If the worker that acquires the XID writes some row and another worker reads that row before it gets to see the XID information, the latter worker won't treat such a row is written by its own transaction.

For instance, the worker (w-1) that acquires the XID (501) deletes the tuple (CTID: 0, 2). Now, another worker (w-2) reads that tuple (CTID: 0, 2), it would consider that the tuple is still visible to its snapshot but if the w-2 knows that 501 is its own XID, it would have been considered it as (not-visible) deleted. I think this can happen when multiple updates to the same row happen and new rows get added to the new page.

* The implementation seems complex
When the DML is run inside a deeply nested subtransaction and the parent transactions have not allocated their XIDs yet, the worker needs to allocate the XIDs for its parents. That indeterminate number of XIDs must be stored in shared memory. The stack of TransactionState structures must also be passed.

Also, TransactionIdIsCurrentTransactionId() uses an array ParallelCurrentXids where parallel workers receive sub-committed XIDs from the leader. This needs to be reconsidered.

2) The leader assigns an XID before entering parallel mode and passes it to workers
This is what was done in [1]Parallel INSERT (INTO ... SELECT ...) /messages/by-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com.

The problem is that the XID would not be used if the data source (SELECT query) returns no valid rows. This is a waste of XID.

However, the data source should be rarely empty when this feature is used. As the following Oracle manual says, parallel DML will be used in data analytics and OLTP batch jobs. There should be plenty of source data in those scenarios.

When to Use Parallel DML
https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-18B2AF09-C548-48DE-A794-86224111549F
--------------------------------------------------
Several scenarios where parallel DML is used include:

Refreshing Tables in a Data Warehouse System

Creating Intermediate Summary Tables

Using Scoring Tables

Updating Historical Tables

Running Batch Jobs
--------------------------------------------------

CONCLUSION
========================================

(1) The overhead of checking parallel-safety could be large
We're inclined to go with solution 2, because it doesn't have a big problem. However, we'd like to try to present some more analysis on solution 1 in this thread.

Regarding how to check parallel safety in executor, I prefer the simpler way of adding a check in function execution. If it turns out to have an untolerable performance problem, we can choose the other approach.

(2) There's no mechanism for parallel workers to assign an XID
We'd like to adopt solution 2 because it will really not have a big issue in the assumed use cases. The implementation is very easy and does not look strange.

Of course, any better-looking idea would be much appreciated. (But simple, or not unnecessarily complex, one is desired.)

[1]: Parallel INSERT (INTO ... SELECT ...) /messages/by-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Parallel INSERT (INTO ... SELECT ...)
/messages/by-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com

Regards
Takayuki Tsunakawa

#2Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: tsunakawa.takay@fujitsu.com (#1)
RE: Parallel INSERT SELECT take 2

BACKGROUND
========================================

We want to realize parallel INSERT SELECT in the following steps:
1) INSERT + parallel SELECT
2) Parallel INSERT + parallel SELECT

Below are example use cases. We don't expect high concurrency or an empty
data source.
* Data loading (ETL or ELT) into an analytics database, typically a data ware
house.
* Batch processing in an OLTP database.
2) Enabling users to declare that the table allows parallel data modification Add
a table property that represents parallel safety of the table for DML statement
execution. Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just
like pg_proc's proparallel. The default is UNSAFE.

The planner assumes that all of the table, its descendant partitions, and their
ancillary objects have the specified parallel safety or safer one. The user is
responsible for its correctness. If the parallel processes find an object that is
less safer than the assumed parallel safety during statement execution, it
throws an ERROR and abort the statement execution.

When the parallel safety of some of these objects is changed, it's costly to
reflect it on the parallel safety of tables that depend on them. So, we don't do
it. Instead, we provide a utility function pg_get_parallel_safety('table_name')
that returns records of (objid, classid, parallel_safety) that represent the
parallel safety of objects that determine the parallel safety of the specified
table. The function only outputs objects that are not parallel safe. Otherwise,
it will consume excessive memory while accumulating the output. The user
can use this function to identify problematic objects when a parallel DML fails
or is not parallelized in an expected manner.

How does the executor detect parallel unsafe objects? There are two ways:

1) At loading time
...
2) At function execution time
All related objects come down to some function execution. So, add a parallel
safety check there when in a parallel worker. If the current process is a parallel
worker and the function is parallel unsafe, error out with ereport(ERROR). This
approach eliminates the oversight of parallel safety check with the additional
bonus of tiny code change!

The place would be FunctionCallInvoke(). It's a macro in fmgr.h now. Perhaps
we should make it a function in fmgr.c, so that fmgr.h does not have to include
header files for parallelism-related definitions.

We have to evaluate the performance effect of converting FunctionCallInvoke()
into a function and adding an if statement there, because it's a relatively
low-level function.

Based on above, we plan to move forward with the apporache 2) (declarative idea).

Attatching the POC patchset which including the following:

0001: provide a utility function pg_get_parallel_safety('table_name').

The function returns records of (objid, classid, parallel_safety) that represent
the parallel safety of objects that determine the parallel safety of the specified table.
Note: The function only outputs objects that are not parallel safe.
(Thanks a lot for greg's previous work, most of the safety check code here is based on it)

0002: allow user use "ALTER TABLE PARALLEL SAFE/UNSAFE/RESTRICTED".

Add proparallel column in pg_class and allow use to change its.

0003: detect parallel unsafe objects in executor.

Currently we choose to check function's parallel safety at function execution time.
We add safety check at FunctionCallInvoke(), but it may be better to check in fmgr_info_cxt_security.
we are still discussing it in another thread[1].

TODO: we currently skip checking built-in function's parallel safety, because we lack the information about built-in
function's parallel safety, we cannot access pg_proc.proparallel in a low level because it could result in infinite recursion.
Adding parallel property in fmgrBuiltin will enlarge the frequently accessed fmgr_builtins and lock down the value of the
parallel-safety flag. The solution is still under discussion. Suggestions and comments are welcome.

0004: fix some mislabeled function in testcase

Since we check parallel safety of function at a low level, we found some functions marked as parallel unsafe will be
executed in parallel mode in regression test when setting force_parallel_mode=regress. After checking, these functions
are parallel safe, So , we plan to fix these function's parallel label.
Note: we plan to take 0004 as a separate patch , see[2], I post 0004 here just to prevent some testcase failures.

The above are the POC patches, it could be imperfect for now and I am still working on improving it.
Suggestions and comments about the design or code are very welcome and appreciated.

Best regards,
houzj

Attachments:

v1-POC-0002-ALTER-TABLE-PARALLEL.patchapplication/octet-stream; name=v1-POC-0002-ALTER-TABLE-PARALLEL.patchDownload+132-13
v1-POC-0003-check-parallel-safety-in-fmgr.patchapplication/octet-stream; name=v1-POC-0003-check-parallel-safety-in-fmgr.patchDownload+952-25
v1-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patchapplication/octet-stream; name=v1-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patchDownload+14-13
v1-POC-0001-get-parallel-safety-function.patchapplication/octet-stream; name=v1-POC-0001-get-parallel-safety-function.patchDownload+729-7
#3Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#2)
RE: Parallel INSERT SELECT take 2

BACKGROUND
========================================

We want to realize parallel INSERT SELECT in the following steps:
1) INSERT + parallel SELECT
2) Parallel INSERT + parallel SELECT

Below are example use cases. We don't expect high concurrency or an
empty data source.
* Data loading (ETL or ELT) into an analytics database, typically a
data ware house.
* Batch processing in an OLTP database.
2) Enabling users to declare that the table allows parallel data
modification Add a table property that represents parallel safety of
the table for DML statement execution. Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u',
'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.

The planner assumes that all of the table, its descendant partitions,
and their ancillary objects have the specified parallel safety or
safer one. The user is responsible for its correctness. If the
parallel processes find an object that is less safer than the assumed
parallel safety during statement execution, it throws an ERROR and abort the

statement execution.

When the parallel safety of some of these objects is changed, it's
costly to reflect it on the parallel safety of tables that depend on
them. So, we don't do it. Instead, we provide a utility function
pg_get_parallel_safety('table_name')
that returns records of (objid, classid, parallel_safety) that
represent the parallel safety of objects that determine the parallel
safety of the specified table. The function only outputs objects that
are not parallel safe. Otherwise, it will consume excessive memory
while accumulating the output. The user can use this function to
identify problematic objects when a parallel DML fails or is not parallelized in

an expected manner.

How does the executor detect parallel unsafe objects? There are two ways:

1) At loading time
...
2) At function execution time
All related objects come down to some function execution. So, add a
parallel safety check there when in a parallel worker. If the current
process is a parallel worker and the function is parallel unsafe,
error out with ereport(ERROR). This approach eliminates the oversight
of parallel safety check with the additional bonus of tiny code change!

The place would be FunctionCallInvoke(). It's a macro in fmgr.h now.
Perhaps we should make it a function in fmgr.c, so that fmgr.h does
not have to include header files for parallelism-related definitions.

We have to evaluate the performance effect of converting
FunctionCallInvoke() into a function and adding an if statement there,
because it's a relatively low-level function.

Based on above, we plan to move forward with the apporache 2) (declarative
idea).

Attatching the POC patchset which including the following:

0001: provide a utility function pg_get_parallel_safety('table_name').

The function returns records of (objid, classid, parallel_safety) that represent
the parallel safety of objects that determine the parallel safety of the
specified table.
Note: The function only outputs objects that are not parallel safe.
(Thanks a lot for greg's previous work, most of the safety check code here is
based on it)

0002: allow user use "ALTER TABLE PARALLEL SAFE/UNSAFE/RESTRICTED".

Add proparallel column in pg_class and allow use to change its.

0003: detect parallel unsafe objects in executor.

Currently we choose to check function's parallel safety at function execution
time.
We add safety check at FunctionCallInvoke(), but it may be better to check in
fmgr_info_cxt_security.
we are still discussing it in another thread[1].

TODO: we currently skip checking built-in function's parallel safety, because
we lack the information about built-in
function's parallel safety, we cannot access pg_proc.proparallel in a low level
because it could result in infinite recursion.
Adding parallel property in fmgrBuiltin will enlarge the frequently accessed
fmgr_builtins and lock down the value of the
parallel-safety flag. The solution is still under discussion. Suggestions and
comments are welcome.

0004: fix some mislabeled function in testcase

Since we check parallel safety of function at a low level, we found some
functions marked as parallel unsafe will be
executed in parallel mode in regression test when setting
force_parallel_mode=regress. After checking, these functions
are parallel safe, So , we plan to fix these function's parallel label.
Note: we plan to take 0004 as a separate patch , see[2], I post 0004 here just
to prevent some testcase failures.

The above are the POC patches, it could be imperfect for now and I am still
working on improving it.
Suggestions and comments about the design or code are very welcome and
appreciated.

Sorry, I forgot to attach the discussion link about [1]/messages/by-id/756027.1619012086@sss.pgh.pa.us and [2]/messages/by-id/OS0PR01MB571637085C0D3AFC3AB3600194479@OS0PR01MB5716.jpnprd01.prod.outlook.com.

[1]: /messages/by-id/756027.1619012086@sss.pgh.pa.us
/messages/by-id/756027.1619012086@sss.pgh.pa.us

[2]: /messages/by-id/OS0PR01MB571637085C0D3AFC3AB3600194479@OS0PR01MB5716.jpnprd01.prod.outlook.com
/messages/by-id/OS0PR01MB571637085C0D3AFC3AB3600194479@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best regards,
houzj

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#2)
Re: Parallel INSERT SELECT take 2

On Thu, Apr 22, 2021 at 4:51 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

BACKGROUND
========================================

We want to realize parallel INSERT SELECT in the following steps:
1) INSERT + parallel SELECT
2) Parallel INSERT + parallel SELECT

Below are example use cases. We don't expect high concurrency or an empty
data source.
* Data loading (ETL or ELT) into an analytics database, typically a data ware
house.
* Batch processing in an OLTP database.
2) Enabling users to declare that the table allows parallel data modification Add
a table property that represents parallel safety of the table for DML statement
execution. Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just
like pg_proc's proparallel. The default is UNSAFE.

The planner assumes that all of the table, its descendant partitions, and their
ancillary objects have the specified parallel safety or safer one. The user is
responsible for its correctness. If the parallel processes find an object that is
less safer than the assumed parallel safety during statement execution, it
throws an ERROR and abort the statement execution.

When the parallel safety of some of these objects is changed, it's costly to
reflect it on the parallel safety of tables that depend on them. So, we don't do
it. Instead, we provide a utility function pg_get_parallel_safety('table_name')
that returns records of (objid, classid, parallel_safety) that represent the
parallel safety of objects that determine the parallel safety of the specified
table. The function only outputs objects that are not parallel safe. Otherwise,
it will consume excessive memory while accumulating the output. The user
can use this function to identify problematic objects when a parallel DML fails
or is not parallelized in an expected manner.

How does the executor detect parallel unsafe objects? There are two ways:

1) At loading time
...
2) At function execution time
All related objects come down to some function execution. So, add a parallel
safety check there when in a parallel worker. If the current process is a parallel
worker and the function is parallel unsafe, error out with ereport(ERROR). This
approach eliminates the oversight of parallel safety check with the additional
bonus of tiny code change!

The place would be FunctionCallInvoke(). It's a macro in fmgr.h now. Perhaps
we should make it a function in fmgr.c, so that fmgr.h does not have to include
header files for parallelism-related definitions.

We have to evaluate the performance effect of converting FunctionCallInvoke()
into a function and adding an if statement there, because it's a relatively
low-level function.

Based on above, we plan to move forward with the apporache 2) (declarative idea).

IIUC, the declarative behaviour idea attributes parallel
safe/unsafe/restricted tags to each table with default being the
unsafe. Does it mean for a parallel unsafe table, no parallel selects,
inserts (may be updates) will be picked up? Or is it only the parallel
inserts? If both parallel inserts, selects will be picked, then the
existing tables need to be adjusted to set the parallel safety tags
while migrating?

Another point, what does it mean a table being parallel restricted?
What should happen if it is present in a query of other parallel safe
tables?

I may be wrong here: IIUC, the main problem we are trying to solve
with the declarative approach is to let the user decide parallel
safety for partition tables as it may be costlier for postgres to
determine it. And for the normal tables we can perform parallel safety
checks without incurring much cost. So, I think we should restrict the
declarative approach to only partitioned tables?

While reading the design, I came across this "erroring out during
execution of a query when a parallel unsafe function is detected". If
this is correct, isn't it warranting users to run
pg_get_parallel_safety to know the parallel unsafe objects, set
parallel safety to all of them if possible, otherwise disable
parallelism to run the query? Isn't this burdensome? Instead, how
about postgres retries the query upon detecting the error that came
from a parallel unsafe function during execution, disable parallelism
and run the query? I think this kind of retry query feature can be
built outside of the core postgres, but IMO it will be good to have
inside (of course configurable). IIRC, the Teradata database has a
Query Retry feature.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#4)
RE: Parallel INSERT SELECT take 2

Based on above, we plan to move forward with the apporache 2) (declarative

idea).

IIUC, the declarative behaviour idea attributes parallel safe/unsafe/restricted
tags to each table with default being the unsafe. Does it mean for a parallel
unsafe table, no parallel selects, inserts (may be updates) will be picked up? Or
is it only the parallel inserts? If both parallel inserts, selects will be picked, then
the existing tables need to be adjusted to set the parallel safety tags while
migrating?

Thanks for looking into this.

The parallel attributes in table means the parallel safety when user does some data-modification operations on it.
So, It only limit the use of parallel plan when using INSERT/UPDATE/DELETE.

Another point, what does it mean a table being parallel restricted?
What should happen if it is present in a query of other parallel safe tables?

If a table is parallel restricted, it means the table contains some parallel restricted objects(such as: parallel restricted functions in index expressions).
And in planner, it means parallel insert plan will not be chosen, but it can use parallel select(with serial insert).

I may be wrong here: IIUC, the main problem we are trying to solve with the
declarative approach is to let the user decide parallel safety for partition tables
as it may be costlier for postgres to determine it. And for the normal tables we
can perform parallel safety checks without incurring much cost. So, I think we
should restrict the declarative approach to only partitioned tables?

Yes, we are tring to avoid overhead when checking parallel safety.
The cost to check all the partition's parallel safety is the biggest one.
Another is the safety check of index's expression.
Currently, for INSERT, the planner does not open the target table's indexinfo and does not
parse the expression of the index. We need to parse the expression in planner if we want
to do parallel safety check for it which can bring some overhead(it will open the index the do the parse in executor again).
So, we plan to skip all of the extra check and let user take responsibility for the safety.

Of course, maybe we can try to pass the indexinfo to the executor but it need some further refactor and I will take a look into it.

While reading the design, I came across this "erroring out during execution of a
query when a parallel unsafe function is detected". If this is correct, isn't it
warranting users to run pg_get_parallel_safety to know the parallel unsafe
objects, set parallel safety to all of them if possible, otherwise disable
parallelism to run the query? Isn't this burdensome?

How about:
If detecting parallel unsafe objects in executor, then, alter the table to parallel unsafe internally.
So, user do not need to alter it manually.

Instead, how about
postgres retries the query upon detecting the error that came from a parallel
unsafe function during execution, disable parallelism and run the query? I think
this kind of retry query feature can be built outside of the core postgres, but
IMO it will be good to have inside (of course configurable). IIRC, the Teradata
database has a Query Retry feature.

Thanks for the suggestion.
The retry query feature sounds like a good idea to me.
OTOH, it sounds more like an independent feature which parallel select can also benefit from it.
I think maybe we can try to achieve it after we commit the parallel insert ?

Best regards,
houzj

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#5)
Re: Parallel INSERT SELECT take 2

On Mon, Apr 26, 2021 at 7:00 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Instead, how about
postgres retries the query upon detecting the error that came from a parallel
unsafe function during execution, disable parallelism and run the query? I think
this kind of retry query feature can be built outside of the core postgres, but
IMO it will be good to have inside (of course configurable). IIRC, the Teradata
database has a Query Retry feature.

Thanks for the suggestion.
The retry query feature sounds like a good idea to me.
OTOH, it sounds more like an independent feature which parallel select can also benefit from it.

+1. I also think retrying a query on an error is not related to this
feature and should be built separately if required.

--
With Regards,
Amit Kapila.

#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#5)
Re: Parallel INSERT SELECT take 2

On Mon, Apr 26, 2021 at 7:00 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Based on above, we plan to move forward with the apporache 2) (declarative

idea).

IIUC, the declarative behaviour idea attributes parallel safe/unsafe/restricted
tags to each table with default being the unsafe. Does it mean for a parallel
unsafe table, no parallel selects, inserts (may be updates) will be picked up? Or
is it only the parallel inserts? If both parallel inserts, selects will be picked, then
the existing tables need to be adjusted to set the parallel safety tags while
migrating?

Thanks for looking into this.

Thanks for the responses.

The parallel attributes in table means the parallel safety when user does some data-modification operations on it.
So, It only limit the use of parallel plan when using INSERT/UPDATE/DELETE.

In that case, isn't it better to use the terminology "PARALLEL DML
SAFE/UNSAFE/RESTRICTED" in the code and docs? This way, it will be
clear that these tags don't affect parallel selects.

Another point, what does it mean a table being parallel restricted?
What should happen if it is present in a query of other parallel safe tables?

If a table is parallel restricted, it means the table contains some parallel restricted objects(such as: parallel restricted functions in index expressions).
And in planner, it means parallel insert plan will not be chosen, but it can use parallel select(with serial insert).

Makes sense. I assume that when there is a parallel restricted
function associated with a table, the current design doesn't enforce
the planner to choose parallel select and it is left up to it.

I may be wrong here: IIUC, the main problem we are trying to solve with the
declarative approach is to let the user decide parallel safety for partition tables
as it may be costlier for postgres to determine it. And for the normal tables we
can perform parallel safety checks without incurring much cost. So, I think we
should restrict the declarative approach to only partitioned tables?

Yes, we are tring to avoid overhead when checking parallel safety.
The cost to check all the partition's parallel safety is the biggest one.
Another is the safety check of index's expression.
Currently, for INSERT, the planner does not open the target table's indexinfo and does not
parse the expression of the index. We need to parse the expression in planner if we want
to do parallel safety check for it which can bring some overhead(it will open the index the do the parse in executor again).
So, we plan to skip all of the extra check and let user take responsibility for the safety.
Of course, maybe we can try to pass the indexinfo to the executor but it need some further refactor and I will take a look into it.

Will the planner parse and check parallel safety of index((where
clause) expressions in case of SELECTs? I'm not sure of this. But if
it does, maybe we could do the same thing for parallel DML as well for
normal tables? What is the overhead of parsing index expressions? If
the cost is heavy for checking index expressions parallel safety in
case of normal tables, then the current design i.e. attributing
parallel safety tag to all the tables makes sense.

I was actually thinking that we will have the declarative approach
only for partitioned tables as it is the main problem we are trying to
solve with this design. Something like: users will run
pg_get_parallel_safety to see the parallel unsafe objects associated
with a partitioned table by looking at all of its partitions and be
able to set a parallel dml safety tag to only partitioned tables.

While reading the design, I came across this "erroring out during execution of a
query when a parallel unsafe function is detected". If this is correct, isn't it
warranting users to run pg_get_parallel_safety to know the parallel unsafe
objects, set parallel safety to all of them if possible, otherwise disable
parallelism to run the query? Isn't this burdensome?

How about:
If detecting parallel unsafe objects in executor, then, alter the table to parallel unsafe internally.
So, user do not need to alter it manually.

I don't think this is a good idea, because, if there are multiple
tables involved in the query, do you alter all the tables? Usually, we
error out on finding the first such unsafe object.

Instead, how about
postgres retries the query upon detecting the error that came from a parallel
unsafe function during execution, disable parallelism and run the query? I think
this kind of retry query feature can be built outside of the core postgres, but
IMO it will be good to have inside (of course configurable). IIRC, the Teradata
database has a Query Retry feature.

Thanks for the suggestion.
The retry query feature sounds like a good idea to me.
OTOH, it sounds more like an independent feature which parallel select can also benefit from it.
I think maybe we can try to achieve it after we commit the parallel insert ?

Yeah, it will be a separate thing altogether.

0001: provide a utility function pg_get_parallel_safety('table_name').

The function returns records of (objid, classid, parallel_safety) that represent
the parallel safety of objects that determine the parallel safety of the specified table.
Note: The function only outputs objects that are not parallel safe.

If it returns only parallel "unsafe" objects and not "safe" or
"restricted" objects, how about naming it to
pg_get_table_parallel_unsafe_objects("table_name")? This way we could
get rid of parallel_safety in the output record? If at all users want
to see parallel restricted or safe objects, we can also have the
counterparts pg_get_table_parallel_safe_objects and
pg_get_table_parallel_restricted_objects. Of course, we can caution
the user that execution of these functions might take longer and
"might consume excessive memory while accumulating the output".

Otherwise, we can have a single function
pg_get_parallel_safety("table_name" IN, "parallel_safety" IN, "objid"
OUT, "classid" OUT)? If required, we could name it
pg_get_parallel_safety_of_table_objects.

Thoughts?

Although, I have not looked at the patches, few questions on
pg_get_parallel_safety function:
1) Will it parse all the expressions for the objects that are listed
under "The objects that relate to the parallel safety of a DML target
table are as follows:" in the upthread?
2) How will it behave if a partitioned table is passed to it? Will it
recurse for all the partitions?
3) How will it behave if a foreign table is passed to it? Will it error out?

In general:
1) Is ALTER SET PARALLEL SAFETY on a partitioned table allowed? If
yes, will it be set based on all the partitions parallel safety?
2) How will users have to decide on parallel safety of a foreign table
or a partitioned table with foreign partitions? Or is it that we set
these tables parallel unsafe and don't do parallel inserts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#8Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#7)
RE: Parallel INSERT SELECT take 2

The parallel attributes in table means the parallel safety when user does some

data-modification operations on it.

So, It only limit the use of parallel plan when using INSERT/UPDATE/DELETE.

In that case, isn't it better to use the terminology "PARALLEL DML
SAFE/UNSAFE/RESTRICTED" in the code and docs? This way, it will be clear that
these tags don't affect parallel selects.

Makes sense, I recalled I have heart similar suggestion before.
If there are no other objections, I plan to change command to
PARALLEL DML in my next version patches.

I may be wrong here: IIUC, the main problem we are trying to solve
with the declarative approach is to let the user decide parallel
safety for partition tables as it may be costlier for postgres to
determine it. And for the normal tables we can perform parallel
safety checks without incurring much cost. So, I think we should restrict the

declarative approach to only partitioned tables?

Yes, we are tring to avoid overhead when checking parallel safety.
The cost to check all the partition's parallel safety is the biggest one.
Another is the safety check of index's expression.
Currently, for INSERT, the planner does not open the target table's
indexinfo and does not parse the expression of the index. We need to
parse the expression in planner if we want to do parallel safety check for it

which can bring some overhead(it will open the index the do the parse in
executor again).

So, we plan to skip all of the extra check and let user take responsibility for

the safety.

Of course, maybe we can try to pass the indexinfo to the executor but it need

some further refactor and I will take a look into it.

Will the planner parse and check parallel safety of index((where
clause) expressions in case of SELECTs? I'm not sure of this. But if it does, maybe
we could do the same thing for parallel DML as well for normal tables?

The planner does not check the index expression, because the expression will not be used in SELECT.
I think the expression is only used when a tuple inserted into the index.

the overhead of parsing index expressions? If the cost is heavy for checking
index expressions parallel safety in case of normal tables, then the current
design i.e. attributing parallel safety tag to all the tables makes sense.

Currently, index expression and predicate are stored in text format.
We need to use stringToNode(expression/predicate) to parse it.
Some committers think doing this twice does not look good,
unless we found some ways to pass parsed info to the executor to avoid the second parse.

I was actually thinking that we will have the declarative approach only for
partitioned tables as it is the main problem we are trying to solve with this
design. Something like: users will run pg_get_parallel_safety to see the parallel
unsafe objects associated with a partitioned table by looking at all of its
partitions and be able to set a parallel dml safety tag to only partitioned tables.

We originally want to make the declarative approach for both normal and partitioned table.
In this way, it will not bring any overhead to planner and looks consistency.
But, do you think we should put some really cheap safety check to the planner ?

0001: provide a utility function pg_get_parallel_safety('table_name').

The function returns records of (objid, classid, parallel_safety) that
represent the parallel safety of objects that determine the parallel safety of

the specified table.

Note: The function only outputs objects that are not parallel safe.

If it returns only parallel "unsafe" objects and not "safe" or "restricted" objects,
how about naming it to pg_get_table_parallel_unsafe_objects("table_name")?

Currently, the function returns both unsafe and restricted objects(I thought restricted is also not safe),
because we thought users only care about the objects that affect the use of parallel plan.

This way we could get rid of parallel_safety in the output record? If at all users
want to see parallel restricted or safe objects, we can also have the
counterparts pg_get_table_parallel_safe_objects and
pg_get_table_parallel_restricted_objects. Of course, we can caution the user
that execution of these functions might take longer and "might consume
excessive memory while accumulating the output".

Otherwise, we can have a single function pg_get_parallel_safety("table_name"
IN, "parallel_safety" IN, "objid"
OUT, "classid" OUT)? If required, we could name it
pg_get_parallel_safety_of_table_objects.

Thoughts?

I am sure will user want to get safe objects, do you have some usecases ?
If users do not care the safe objects, I think they can use
"SELECT * FROM pg_get_parallel_safety() where parallel_safety = 'specified safety' " to get the specified objects.

Although, I have not looked at the patches, few questions on
pg_get_parallel_safety function:
1) Will it parse all the expressions for the objects that are listed under "The
objects that relate to the parallel safety of a DML target table are as follows:" in
the upthread?

Yes.
But some parsed expression(such as domain type's expression) can be found in typecache,
we just check the safety for these already parsed expression.

2) How will it behave if a partitioned table is passed to it? Will it recurse for all
the partitions?

Yes, because both parent table and child table will be inserted and the parallel
related objects on them will be executed. If users want to make sure the parallel insert succeed,
they need to check all the objects.

3) How will it behave if a foreign table is passed to it? Will it error out?

It currently does not error out.
It will also check the objects on it and return not safe objects.
Note: I consider foreign table itself as a parallel restricted object, because it does not support parallel insert fdw api for now.

In general:
1) Is ALTER SET PARALLEL SAFETY on a partitioned table allowed?

Yes.

If yes, will it be
set based on all the partitions parallel safety?

Yes,
But the ALTER PARALLEL command itself does not check all the partition's safety flag.
The function pg_get_parallel_safety can return all the partition's not safe objects,
user should set the parallel safety based the function's result.

2) How will users have to decide on parallel safety of a foreign table or a
partitioned table with foreign partitions? Or is it that we set these tables
parallel unsafe and don't do parallel inserts?

Foreign table itself is considered as parallel restricted,
because we do not support parallel insert fdw api for now.

Best regards,
houzj

#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#8)
Re: Parallel INSERT SELECT take 2

On Mon, Apr 26, 2021 at 4:56 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

The parallel attributes in table means the parallel safety when user does some

data-modification operations on it.

So, It only limit the use of parallel plan when using INSERT/UPDATE/DELETE.

In that case, isn't it better to use the terminology "PARALLEL DML
SAFE/UNSAFE/RESTRICTED" in the code and docs? This way, it will be clear that
these tags don't affect parallel selects.

Makes sense, I recalled I have heart similar suggestion before.
If there are no other objections, I plan to change command to
PARALLEL DML in my next version patches.

+1 from me. Let's hear from others.

Will the planner parse and check parallel safety of index((where
clause) expressions in case of SELECTs? I'm not sure of this. But if it does, maybe
we could do the same thing for parallel DML as well for normal tables?

The planner does not check the index expression, because the expression will not be used in SELECT.
I think the expression is only used when a tuple inserted into the index.

Oh.

the overhead of parsing index expressions? If the cost is heavy for checking
index expressions parallel safety in case of normal tables, then the current
design i.e. attributing parallel safety tag to all the tables makes sense.

Currently, index expression and predicate are stored in text format.
We need to use stringToNode(expression/predicate) to parse it.
Some committers think doing this twice does not look good,
unless we found some ways to pass parsed info to the executor to avoid the second parse.

How much is the extra cost that's added if we do stringToNode twiice?
Maybe, we can check for a few sample test cases by just having
stringToNode(expression/predicate) in the planner and see if it adds
much to the total execution time of the query.

I was actually thinking that we will have the declarative approach only for
partitioned tables as it is the main problem we are trying to solve with this
design. Something like: users will run pg_get_parallel_safety to see the parallel
unsafe objects associated with a partitioned table by looking at all of its
partitions and be able to set a parallel dml safety tag to only partitioned tables.

We originally want to make the declarative approach for both normal and partitioned table.
In this way, it will not bring any overhead to planner and looks consistency.
But, do you think we should put some really cheap safety check to the planner ?

I still feel that why we shouldn't limit the declarative approach to
only partitioned tables? And for normal tables, possibly with a
minimal cost(??), the server can do the safety checking. I know this
feels a little inconsistent. In the planner we will have different
paths like: if (partitioned_table) { check the parallel safety tag
associated with the table } else { perform the parallel safety of the
associated objects }.

Others may have better thoughts on this.

0001: provide a utility function pg_get_parallel_safety('table_name').

The function returns records of (objid, classid, parallel_safety) that
represent the parallel safety of objects that determine the parallel safety of

the specified table.

Note: The function only outputs objects that are not parallel safe.

If it returns only parallel "unsafe" objects and not "safe" or "restricted" objects,
how about naming it to pg_get_table_parallel_unsafe_objects("table_name")?

Currently, the function returns both unsafe and restricted objects(I thought restricted is also not safe),
because we thought users only care about the objects that affect the use of parallel plan.

Hm.

This way we could get rid of parallel_safety in the output record? If at all users
want to see parallel restricted or safe objects, we can also have the
counterparts pg_get_table_parallel_safe_objects and
pg_get_table_parallel_restricted_objects. Of course, we can caution the user
that execution of these functions might take longer and "might consume
excessive memory while accumulating the output".

Otherwise, we can have a single function pg_get_parallel_safety("table_name"
IN, "parallel_safety" IN, "objid"
OUT, "classid" OUT)? If required, we could name it
pg_get_parallel_safety_of_table_objects.

Thoughts?

I am sure will user want to get safe objects, do you have some usecases ?
If users do not care the safe objects, I think they can use
"SELECT * FROM pg_get_parallel_safety() where parallel_safety = 'specified safety' " to get the specified objects.

I don't know any practical scenarios, but If I'm a user, at least I
will be tempted to see the parallel safe objects associated with a
particular table along with unsafe and restricted ones. Others may
have better thoughts on this.

Although, I have not looked at the patches, few questions on
pg_get_parallel_safety function:
1) Will it parse all the expressions for the objects that are listed under "The
objects that relate to the parallel safety of a DML target table are as follows:" in
the upthread?

Yes.
But some parsed expression(such as domain type's expression) can be found in typecache,
we just check the safety for these already parsed expression.

Oh.

2) How will it behave if a partitioned table is passed to it? Will it recurse for all
the partitions?

Yes, because both parent table and child table will be inserted and the parallel
related objects on them will be executed. If users want to make sure the parallel insert succeed,
they need to check all the objects.

Then, running the pg_get_parallel_safety will have some overhead if
there are many partitions associated with a table. And, this is the
overhead planner would have had to incur without the declarative
approach which we are trying to avoid with this design.

3) How will it behave if a foreign table is passed to it? Will it error out?

It currently does not error out.
It will also check the objects on it and return not safe objects.
Note: I consider foreign table itself as a parallel restricted object, because it does not support parallel insert fdw api for now.

2) How will users have to decide on parallel safety of a foreign table or a
partitioned table with foreign partitions? Or is it that we set these tables
parallel unsafe and don't do parallel inserts?

Foreign table itself is considered as parallel restricted,
because we do not support parallel insert fdw api for now.

Maybe, the ALTER TABLE ... SET PARALLEL on a foreign table should
default to parallel restricted always and emit a warning saying the
reason?

In general:
1) Is ALTER SET PARALLEL SAFETY on a partitioned table allowed?

Yes.

If yes, will it be
set based on all the partitions parallel safety?

Yes,
But the ALTER PARALLEL command itself does not check all the partition's safety flag.
The function pg_get_parallel_safety can return all the partition's not safe objects,
user should set the parallel safety based the function's result.

I'm thinking that when users say ALTER TABLE partioned_table SET
PARALLEL TO 'safe';, we check all the partitions' and their associated
objects' parallel safety? If all are parallel safe, then only we set
partitioned_table as parallel safe. What should happen if the parallel
safety of any of the associated objects/partitions changes after
setting the partitioned_table safety?

My understanding was that: the command ALTER TABLE ... SET PARALLEL TO
'safe' work will check the parallel safety of all the objects
associated with the table. If the objects are all parallel safe, then
the table will be set to safe. If at least one object is parallel
unsafe or restricted, then the command will fail. I was also thinking
that how will the design cope with situations such as the parallel
safety of any of the associated objects changing after setting the
table to parallel safe. The planner would have relied on the outdated
parallel safety of the table and chosen parallel inserts and the
executor will catch such situations. Looks like my understanding was
wrong.

So, the ALTER TABLE ... SET PARALLEL TO command just sets the target
table safety, doesn't bother what the associated objects' safety is.
It just believes the user. If at all there are any parallel unsafe
objects it will be caught by the executor. Just like, setting parallel
safety of the functions/aggregates, the docs caution users about
accidentally/intentionally tagging parallel unsafe
functions/aggregates as parallel safe.

Note: I meant the table objects are the ones that are listed under
"The objects that relate to the parallel safety of a DML target table
are as follows:" in the upthread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#10Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#9)
RE: Parallel INSERT SELECT take 2

Currently, index expression and predicate are stored in text format.
We need to use stringToNode(expression/predicate) to parse it.
Some committers think doing this twice does not look good, unless we
found some ways to pass parsed info to the executor to avoid the second

parse.

How much is the extra cost that's added if we do stringToNode twiice?
Maybe, we can check for a few sample test cases by just having
stringToNode(expression/predicate) in the planner and see if it adds much to
the total execution time of the query.

OK, I will do some test on it.

Yes, because both parent table and child table will be inserted and
the parallel related objects on them will be executed. If users want
to make sure the parallel insert succeed, they need to check all the objects.

Then, running the pg_get_parallel_safety will have some overhead if there are
many partitions associated with a table. And, this is the overhead planner
would have had to incur without the declarative approach which we are trying
to avoid with this design.

Yes, I think put such overhead in a separate function is better than in a common path(planner).

Foreign table itself is considered as parallel restricted, because we
do not support parallel insert fdw api for now.

Maybe, the ALTER TABLE ... SET PARALLEL on a foreign table should default to
parallel restricted always and emit a warning saying the reason?

Thanks for the comment, I agree.
I will change this in the next version patches.

But the ALTER PARALLEL command itself does not check all the partition's

safety flag.

The function pg_get_parallel_safety can return all the partition's not
safe objects, user should set the parallel safety based the function's result.

I'm thinking that when users say ALTER TABLE partioned_table SET PARALLEL
TO 'safe';, we check all the partitions' and their associated objects' parallel
safety? If all are parallel safe, then only we set partitioned_table as parallel safe.
What should happen if the parallel safety of any of the associated
objects/partitions changes after setting the partitioned_table safety?

Currently, nothing happened if any of the associated objects/partitions changes after setting the partitioned_table safety.
Because , we do not have a really cheap way to catch the change. The existing relcache does not work because alter function
does not invalid the relcache which the function belongs to. And it will bring some other overhead(locking, systable scan,...)
to find the table the objects belong to.

My understanding was that: the command ALTER TABLE ... SET PARALLEL TO
'safe' work will check the parallel safety of all the objects associated with the
table. If the objects are all parallel safe, then the table will be set to safe. If at
least one object is parallel unsafe or restricted, then the command will fail.

I think this idea makes sense. Some detail of the designed can be improved.
I agree with you that we can try to check check all the partitions' and their associated objects' parallel safety when ALTER PARALLEL.
Because it's a separate new command, add some overhead to it seems not too bad.
If there are no other objections, I plan to add safety check in the ALTER PARALLEL command.

also thinking that how will the design cope with situations such as the parallel
safety of any of the associated objects changing after setting the table to
parallel safe. The planner would have relied on the outdated parallel safety of
the table and chosen parallel inserts and the executor will catch such situations.
Looks like my understanding was wrong.

Currently, we assume user is responsible for its correctness.
Because, from our research, when the parallel safety of some of these objects is changed,
it's costly to reflect it on the parallel safety of tables that depend on them.
(we need to scan the pg_depend,pg_inherit,pg_index.... to find the target table)

So, the ALTER TABLE ... SET PARALLEL TO command just sets the target table
safety, doesn't bother what the associated objects' safety is.
It just believes the user. If at all there are any parallel unsafe objects it will be
caught by the executor. Just like, setting parallel safety of the
functions/aggregates, the docs caution users about accidentally/intentionally
tagging parallel unsafe functions/aggregates as parallel safe.

Yes, thanks for looking into this.

Best regards,
houzj

#11Greg Nancarrow
gregn4422@gmail.com
In reply to: Bharath Rupireddy (#9)
Re: Parallel INSERT SELECT take 2

On Tue, Apr 27, 2021 at 10:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I still feel that why we shouldn't limit the declarative approach to
only partitioned tables? And for normal tables, possibly with a
minimal cost(??), the server can do the safety checking. I know this
feels a little inconsistent. In the planner we will have different
paths like: if (partitioned_table) { check the parallel safety tag
associated with the table } else { perform the parallel safety of the
associated objects }.

Personally I think the simplest and best approach is just do it
consistently, using the declarative approach across all table types.

Then, running the pg_get_parallel_safety will have some overhead if
there are many partitions associated with a table. And, this is the
overhead planner would have had to incur without the declarative
approach which we are trying to avoid with this design.

The big difference is that pg_get_parallel_safety() is intended to be
used during development, not as part of runtime parallel-safety checks
(which are avoided using the declarative approach).
So there is no runtime overhead associated with pg_get_parallel_safety().

I'm thinking that when users say ALTER TABLE partioned_table SET
PARALLEL TO 'safe';, we check all the partitions' and their associated
objects' parallel safety? If all are parallel safe, then only we set
partitioned_table as parallel safe. What should happen if the parallel
safety of any of the associated objects/partitions changes after
setting the partitioned_table safety?

With the declarative approach, there is no parallel-safety checking on
either the CREATE/ALTER when the parallel-safety is declared/set.
It's up to the user to get it right. If it's actually wrong, it will
be detected at runtime.

Regards,
Greg Nancarrow
Fujitsu Australia

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#11)
Re: Parallel INSERT SELECT take 2

On Tue, Apr 27, 2021 at 7:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Apr 27, 2021 at 10:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I still feel that why we shouldn't limit the declarative approach to
only partitioned tables? And for normal tables, possibly with a
minimal cost(??), the server can do the safety checking. I know this
feels a little inconsistent. In the planner we will have different
paths like: if (partitioned_table) { check the parallel safety tag
associated with the table } else { perform the parallel safety of the
associated objects }.

Personally I think the simplest and best approach is just do it
consistently, using the declarative approach across all table types.

Yeah, if we decide to go with a declarative approach then that sounds
like a better approach.

Then, running the pg_get_parallel_safety will have some overhead if
there are many partitions associated with a table. And, this is the
overhead planner would have had to incur without the declarative
approach which we are trying to avoid with this design.

The big difference is that pg_get_parallel_safety() is intended to be
used during development, not as part of runtime parallel-safety checks
(which are avoided using the declarative approach).
So there is no runtime overhead associated with pg_get_parallel_safety().

I'm thinking that when users say ALTER TABLE partioned_table SET
PARALLEL TO 'safe';, we check all the partitions' and their associated
objects' parallel safety? If all are parallel safe, then only we set
partitioned_table as parallel safe. What should happen if the parallel
safety of any of the associated objects/partitions changes after
setting the partitioned_table safety?

With the declarative approach, there is no parallel-safety checking on
either the CREATE/ALTER when the parallel-safety is declared/set.
It's up to the user to get it right. If it's actually wrong, it will
be detected at runtime.

OTOH, even if we want to verify at DDL time, we won't be able to
maintain it at the later point of time say if user changed the
parallel-safety of some function used in check constraint. I think the
function pg_get_parallel_safety() will help the user to decide whether
it can declare table parallel-safe. Now, it is quite possible that the
user can later change the parallel-safe property of some function then
that should be caught at runtime.

--
With Regards,
Amit Kapila.

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#10)
Re: Parallel INSERT SELECT take 2

On Tue, Apr 27, 2021 at 7:39 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

I'm thinking that when users say ALTER TABLE partioned_table SET PARALLEL
TO 'safe';, we check all the partitions' and their associated objects' parallel
safety? If all are parallel safe, then only we set partitioned_table as parallel safe.
What should happen if the parallel safety of any of the associated
objects/partitions changes after setting the partitioned_table safety?

Currently, nothing happened if any of the associated objects/partitions changes after setting the partitioned_table safety.
Because , we do not have a really cheap way to catch the change. The existing relcache does not work because alter function
does not invalid the relcache which the function belongs to. And it will bring some other overhead(locking, systable scan,...)
to find the table the objects belong to.

Makes sense. Anyways, the user is responsible for such changes and
otherwise the executor can catch them at run time, if not, the users
will see unintended consequences.

My understanding was that: the command ALTER TABLE ... SET PARALLEL TO
'safe' work will check the parallel safety of all the objects associated with the
table. If the objects are all parallel safe, then the table will be set to safe. If at
least one object is parallel unsafe or restricted, then the command will fail.

I think this idea makes sense. Some detail of the designed can be improved.
I agree with you that we can try to check check all the partitions' and their associated objects' parallel safety when ALTER PARALLEL.
Because it's a separate new command, add some overhead to it seems not too bad.
If there are no other objections, I plan to add safety check in the ALTER PARALLEL command.

Maybe we can make the parallel safety check of the associated
objects/partitions optional for CREATE/ALTER DDLs, with the default
being no checks performed. Both Greg and Amit agree that we don't have
to perform any parallel safety checks during CREATE/ALTER DDLs.

also thinking that how will the design cope with situations such as the parallel
safety of any of the associated objects changing after setting the table to
parallel safe. The planner would have relied on the outdated parallel safety of
the table and chosen parallel inserts and the executor will catch such situations.
Looks like my understanding was wrong.

Currently, we assume user is responsible for its correctness.
Because, from our research, when the parallel safety of some of these objects is changed,
it's costly to reflect it on the parallel safety of tables that depend on them.
(we need to scan the pg_depend,pg_inherit,pg_index.... to find the target table)

Agree.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#14Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Greg Nancarrow (#11)
Re: Parallel INSERT SELECT take 2

On Tue, Apr 27, 2021 at 7:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Apr 27, 2021 at 10:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I still feel that why we shouldn't limit the declarative approach to
only partitioned tables? And for normal tables, possibly with a
minimal cost(??), the server can do the safety checking. I know this
feels a little inconsistent. In the planner we will have different
paths like: if (partitioned_table) { check the parallel safety tag
associated with the table } else { perform the parallel safety of the
associated objects }.

Personally I think the simplest and best approach is just do it
consistently, using the declarative approach across all table types.

Agree.

Then, running the pg_get_parallel_safety will have some overhead if
there are many partitions associated with a table. And, this is the
overhead planner would have had to incur without the declarative
approach which we are trying to avoid with this design.

The big difference is that pg_get_parallel_safety() is intended to be
used during development, not as part of runtime parallel-safety checks
(which are avoided using the declarative approach).
So there is no runtime overhead associated with pg_get_parallel_safety().

Yes, while we avoid runtime overhead, but we run the risk of changed
parallel safety of any of the underlying objects/functions/partitions.
This risk will anyways be unavoidable with declarative approach.

I'm thinking that when users say ALTER TABLE partioned_table SET
PARALLEL TO 'safe';, we check all the partitions' and their associated
objects' parallel safety? If all are parallel safe, then only we set
partitioned_table as parallel safe. What should happen if the parallel
safety of any of the associated objects/partitions changes after
setting the partitioned_table safety?

With the declarative approach, there is no parallel-safety checking on
either the CREATE/ALTER when the parallel-safety is declared/set.
It's up to the user to get it right. If it's actually wrong, it will
be detected at runtime.

As I said upthread, we can provide the parallel safety check of
associated objects/partitions as an option with default as false. I'm
not sure if this is a good thing to do at all. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#12)
Re: Parallel INSERT SELECT take 2

On Tue, Apr 27, 2021 at 8:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 7:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Apr 27, 2021 at 10:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I still feel that why we shouldn't limit the declarative approach to
only partitioned tables? And for normal tables, possibly with a
minimal cost(??), the server can do the safety checking. I know this
feels a little inconsistent. In the planner we will have different
paths like: if (partitioned_table) { check the parallel safety tag
associated with the table } else { perform the parallel safety of the
associated objects }.

Personally I think the simplest and best approach is just do it
consistently, using the declarative approach across all table types.

Yeah, if we decide to go with a declarative approach then that sounds
like a better approach.

Agree.

Then, running the pg_get_parallel_safety will have some overhead if
there are many partitions associated with a table. And, this is the
overhead planner would have had to incur without the declarative
approach which we are trying to avoid with this design.

The big difference is that pg_get_parallel_safety() is intended to be
used during development, not as part of runtime parallel-safety checks
(which are avoided using the declarative approach).
So there is no runtime overhead associated with pg_get_parallel_safety().

I'm thinking that when users say ALTER TABLE partioned_table SET
PARALLEL TO 'safe';, we check all the partitions' and their associated
objects' parallel safety? If all are parallel safe, then only we set
partitioned_table as parallel safe. What should happen if the parallel
safety of any of the associated objects/partitions changes after
setting the partitioned_table safety?

With the declarative approach, there is no parallel-safety checking on
either the CREATE/ALTER when the parallel-safety is declared/set.
It's up to the user to get it right. If it's actually wrong, it will
be detected at runtime.

OTOH, even if we want to verify at DDL time, we won't be able to
maintain it at the later point of time say if user changed the
parallel-safety of some function used in check constraint. I think the
function pg_get_parallel_safety() will help the user to decide whether
it can declare table parallel-safe. Now, it is quite possible that the
user can later change the parallel-safe property of some function then
that should be caught at runtime.

Yeah, this is an unavoidable problem with the declarative approach.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#16Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#14)
RE: Parallel INSERT SELECT take 2

Hi,

Attaching new version patches with the following change:

0002:
1): Change "ALTER/CREATE TABLE PARALLEL SAFE" to "ALTER/CREATE TABLE PARALLEL DML SAFE"
2): disallow temp/foreign table to be parallel safe.

0003:
1) Temporarily, add the check of built-in function by adding a member for proparallel in FmgrBuiltin.
To avoid enlarging FmgrBuiltin struct , change the existing bool members strict and and retset into
one member of type char, and represent the original values with some bit flags.

Note: this will lock down the parallel property of built-in function, but, I think the parallel safety of built-in function
is related to the C code, user should not change the property of it unless they change its code. So, I think it might be
better to disallow changing parallel safety for built-in functions, Thoughts ?

I have not added the parallel safety check in ALTER/CREATE table PARALLEL DML SAFE command.
I think it seems better to add it after some more discussion.

Best regards,
houzj

Attachments:

v2-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patchapplication/octet-stream; name=v2-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patchDownload+27-18
v2-POC-0001-get-parallel-safety-function.patchapplication/octet-stream; name=v2-POC-0001-get-parallel-safety-function.patchDownload+729-7
v2-POC-0002-CREATE-ALTER-TABLE-PARALLEL-DML.patchapplication/octet-stream; name=v2-POC-0002-CREATE-ALTER-TABLE-PARALLEL-DML.patchDownload+238-34
v2-POC-0003-check-parallel-safety-in-fmgr_info.patchapplication/octet-stream; name=v2-POC-0003-check-parallel-safety-in-fmgr_info.patchDownload+967-27
#17Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#16)
Re: Parallel INSERT SELECT take 2

On Wed, Apr 28, 2021 at 12:44 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

0003:
1) Temporarily, add the check of built-in function by adding a member for proparallel in FmgrBuiltin.
To avoid enlarging FmgrBuiltin struct , change the existing bool members strict and and retset into
one member of type char, and represent the original values with some bit flags.

I was thinking that it would be better to replace the two bool members
with one "unsigned char" member for the bitflags for strict and
retset, and another "char" member for parallel.
The struct would still remain the same size as it originally was, and
you wouldn't need to convert between bit settings and char
('u'/'r'/'s') each time a built-in function was checked for
parallel-safety in fmgr_info().

Note: this will lock down the parallel property of built-in function, but, I think the parallel safety of built-in function
is related to the C code, user should not change the property of it unless they change its code. So, I think it might be
better to disallow changing parallel safety for built-in functions, Thoughts ?

I'd vote for disallowing it (unless someone can justify why it
currently is allowed).

I have not added the parallel safety check in ALTER/CREATE table PARALLEL DML SAFE command.
I think it seems better to add it after some more discussion.

I'd vote for not adding such a check (as this is a declaration).

Some additional comments:

1) In patch 0002 comment, it says:
This property is recorded in pg_class's relparallel column as 'u', 'r', or 's',
just like pg_proc's proparallel. The default is UNSAFE.
It should say "relparalleldml" column.

2) With the patches applied, I seem to be getting a couple of errors
when running "make installcheck-world" with
force_parallel_mode=regress in effect.
Can you please try it?

Regards,
Greg Nancarrow
Fujitsu Australia

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#1)
Re: Parallel INSERT SELECT take 2

On Mon, Apr 12, 2021 at 6:52 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

SOLUTION TO (1)
========================================

The candidate ideas are:

1) Caching the result of parallel-safety check
The planner stores the result of checking parallel safety for each relation in relcache, or some purpose-built hash table in shared memory.

The problems are:

* Even if the target relation turns out to be parallel safe by looking at those data structures, we cannot assume it remains true until the SQL statement finishes. For instance, other sessions might add a parallel-unsafe index to its descendant relations. Other examples include that when the user changes the parallel safety of indexes or triggers by running ALTER FUNCTION on the underlying index AM function or trigger function, the relcache entry of the table or index is not invalidated, so the correct parallel safety is not maintained in the cache.
In that case, when the executor encounters a parallel-unsafe object, it can change the cached state as being parallel-unsafe and error out.

* Can't ensure fast access. With relcache, the first access in each session has to undergo the overhead of parallel-safety check. With a hash table in shared memory, the number of relations stored there would be limited, so the first access after database startup or the hash table entry eviction similarly experiences slowness.

* With a new hash table, some lwlock for concurrent access must be added, which can have an adverse effect on performance.

2) Enabling users to declare that the table allows parallel data modification
Add a table property that represents parallel safety of the table for DML statement execution. Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.

So, in short, if we need to go with any sort of solution with caching,
we can't avoid
(a) locking all the partitions
(b) getting an error while execution because at a later point user has
altered the parallel-safe property of a relation.

We can't avoid locking all the partitions because while we are
executing the statement, the user can change the parallel-safety for
one of the partitions by changing a particular partition and if we
didn't have a lock on that partition, it will lead to an error during
execution. Now, here, one option could be that we document this point
and then don't take lock on any of the partitions except for root
table. So, the design would be simpler, that we either cache the
parallel-safe in relcache or shared hash table and just lock the
parent table and perform all parallel-safety checks for the first
time.

I think if we want to go with the behavior that we will error out
during statement execution if any parallel-safe property is changed at
run-time, it is better to go with the declarative approach. In the
declarative approach, at least the user will be responsible for taking
any such decision and the chances of toggling the parallel-safe
property will be less. To aid users, as suggested, we can provide a
function to determine parallel-safety of relation for DML operations.

Now, in the declarative approach, we can either go with whatever the
user has mentioned or we can do some checks during DDL to determine
the actual parallel-safety. I think even if try to determine
parallel-safety during DDL it will be quite tricky in some cases, like
when a user performs Alter Function to change parallel-safety of the
function used in some constraint for the table or if the user changes
parallel-safety of one of the partition then we need to traverse the
partition hierarchy upwards which doesn't seem advisable. So, I guess
it is better to go with whatever the user has mentioned but if you or
others feel we can have some sort of parallel-safety checks during DDL
as well.

The planner assumes that all of the table, its descendant partitions, and their ancillary objects have the specified parallel safety or safer one. The user is responsible for its correctness. If the parallel processes find an object that is less safer than the assumed parallel safety during statement execution, it throws an ERROR and abort the statement execution.

The objects that relate to the parallel safety of a DML target table are as follows:

* Column default expression
* DOMAIN type CHECK expression
* CHECK constraints on column
* Partition key
* Partition key support function
* Index expression
* Index predicate
* Index AM function
* Operator function
* Trigger function

When the parallel safety of some of these objects is changed, it's costly to reflect it on the parallel safety of tables that depend on them. So, we don't do it. Instead, we provide a utility function pg_get_parallel_safety('table_name') that returns records of (objid, classid, parallel_safety) that represent the parallel safety of objects that determine the parallel safety of the specified table. The function only outputs objects that are not parallel safe.

So, users need to check count(*) for this to determine
parallel-safety? How about if we provide a wrapper function on top of
this function or a separate function that returns char to indicate
whether it is safe, unsafe, or restricted to perform a DML operation
on the table?

How does the executor detect parallel unsafe objects? There are two ways:

1) At loading time
When the executor loads the definition of objects (tables, constraints, index, triggers, etc.) during the first access to them after session start or their eviction by sinval message, it checks the parallel safety.

This is a legitimate way, but may need much code. Also, it might overlook necessary code changes without careful inspection.

If we want to go with a declarative approach, then I think we should
try to do this because it will be almost free in some cases and we can
detect error early. For example, when we decide to insert in a
partition that is declared unsafe whereas the root (partitioned) table
is declared safe.

--
With Regards,
Amit Kapila.

#19Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#18)
Re: Parallel INSERT SELECT take 2

On Mon, May 10, 2021 at 11:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Now, in the declarative approach, we can either go with whatever the
user has mentioned or we can do some checks during DDL to determine
the actual parallel-safety. I think even if try to determine
parallel-safety during DDL it will be quite tricky in some cases, like
when a user performs Alter Function to change parallel-safety of the
function used in some constraint for the table or if the user changes
parallel-safety of one of the partition then we need to traverse the
partition hierarchy upwards which doesn't seem advisable. So, I guess
it is better to go with whatever the user has mentioned but if you or
others feel we can have some sort of parallel-safety checks during DDL
as well.

IMHO, it makes sense to go with what the user has declared to avoid
complexity. And, I don't see any problem with that.

The planner assumes that all of the table, its descendant partitions, and their ancillary objects have the specified parallel safety or safer one. The user is responsible for its correctness. If the parallel processes find an object that is less safer than the assumed parallel safety during statement execution, it throws an ERROR and abort the statement execution.

The objects that relate to the parallel safety of a DML target table are as follows:

* Column default expression
* DOMAIN type CHECK expression
* CHECK constraints on column
* Partition key
* Partition key support function
* Index expression
* Index predicate
* Index AM function
* Operator function
* Trigger function

When the parallel safety of some of these objects is changed, it's costly to reflect it on the parallel safety of tables that depend on them. So, we don't do it. Instead, we provide a utility function pg_get_parallel_safety('table_name') that returns records of (objid, classid, parallel_safety) that represent the parallel safety of objects that determine the parallel safety of the specified table. The function only outputs objects that are not parallel safe.

So, users need to check count(*) for this to determine
parallel-safety? How about if we provide a wrapper function on top of
this function or a separate function that returns char to indicate
whether it is safe, unsafe, or restricted to perform a DML operation
on the table?

Such wrapper function make sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#20Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Dilip Kumar (#19)
RE: Parallel INSERT SELECT take 2

When the parallel safety of some of these objects is changed, it's costly to

reflect it on the parallel safety of tables that depend on them. So, we don't do
it. Instead, we provide a utility function pg_get_parallel_safety('table_name')
that returns records of (objid, classid, parallel_safety) that represent the
parallel safety of objects that determine the parallel safety of the specified
table. The function only outputs objects that are not parallel safe.

So, users need to check count(*) for this to determine
parallel-safety? How about if we provide a wrapper function on top of
this function or a separate function that returns char to indicate
whether it is safe, unsafe, or restricted to perform a DML operation
on the table?

Such wrapper function make sense.

Thanks for the suggestion, and I agree.
I will add another wrapper function and post new version patches soon.

Best regards,
houzj

#21Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#20)
#22Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#21)
#23Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#21)
#24Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#23)
#25Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Bharath Rupireddy (#22)
#26Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#24)
#27Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#26)
#28Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#27)
#29Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#28)
#30Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#29)
#31Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#30)
#32Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#31)
#33Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#32)
#34Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#33)
#35Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#34)
#36Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#35)
#37Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#36)
#38Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#37)
#39Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#38)
#40Greg Nancarrow
gregn4422@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#38)
#41Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Greg Nancarrow (#40)