Allow parallel plan for referential integrity checks?
Hello,
I noticed that referential integrity checks aren't currently
parallelized. Is it on purpose?
From the documentation [1]https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html, the planner will not generate a parallel
plan for a given query if any of the following are true:
1) The system is running in single-user mode.
2) max_parallel_workers_per_gather = 0.
3) The query writes any data or locks any database rows.
4) The query might be suspended during execution (cursor or PL/pgSQL loop).
5) The query uses any function marked PARALLEL UNSAFE.
6) The query is running inside of another query that is already parallel.
From my understanding of the code, it seems to me that the fourth
condition is not always accurately detected. In particular, the query
triggered by a foreign key addition (a Left Excluding JOIN between the
two tables) isn't parallelized, because the flag CURSOR_OPT_PARALLEL_OK
hasn't been set at some point. I couldn't find any reason why not, but
maybe I missed something.
Attached is a (naive) patch that aims to fix the case of a FK addition,
but the handling of the flag CURSOR_OPT_PARALLEL_OK, generally speaking,
looks rather hackish.
Best regards,
Frédéric
[1]: https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html
Attachments:
0001-Allow-parallel-plan-for-referential-integrity-checks.patchtext/x-patch; charset=UTF-8; name=0001-Allow-parallel-plan-for-referential-integrity-checks.patchDownload
From 1353a4b7e7d08b13cbaa85a5f9ae26819b5cf2c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= <frederic.yhuel@dalibo.com>
Date: Sun, 6 Feb 2022 13:31:55 +0100
Subject: [PATCH] Allow parallel plan for referential integrity checks
---
src/backend/utils/adt/ri_triggers.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 96269fc2ad..84c16b6962 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1317,6 +1317,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
char workmembuf[32];
int spi_result;
SPIPlanPtr qplan;
+ SPIPrepareOptions options;
riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
@@ -1483,10 +1484,12 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
* Generate the plan. We don't need to cache it, and there are no
* arguments to the plan.
*/
- qplan = SPI_prepare(querybuf.data, 0, NULL);
+ memset(&options, 0, sizeof(options));
+ options.cursorOptions = CURSOR_OPT_PARALLEL_OK;
+ qplan = SPI_prepare_extended(querybuf.data, &options);
if (qplan == NULL)
- elog(ERROR, "SPI_prepare returned %s for %s",
+ elog(ERROR, "SPI_prepare_extended returned %s for %s",
SPI_result_code_string(SPI_result), querybuf.data);
/*
--
2.30.2
On 2/7/22 11:26, Frédéric Yhuel wrote:
Attached is a (naive) patch that aims to fix the case of a FK addition,
but the handling of the flag CURSOR_OPT_PARALLEL_OK, generally speaking,
looks rather hackish.
Thanks, for the patch. You can add it to the current open commitfest
(https://commitfest.postgresql.org/37/).
Best regards,
Andreas
On 2/11/22 00:16, Andreas Karlsson wrote:
On 2/7/22 11:26, Frédéric Yhuel wrote:
Attached is a (naive) patch that aims to fix the case of a FK
addition, but the handling of the flag CURSOR_OPT_PARALLEL_OK,
generally speaking, looks rather hackish.Thanks, for the patch. You can add it to the current open commitfest
(https://commitfest.postgresql.org/37/).
OK, I just did. Please let me know if I did something wrong.
Best regards,
Frédéric
On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
I noticed that referential integrity checks aren't currently
parallelized. Is it on purpose?
It's not 100% clear to me that it is safe. But on the other hand, it's
also not 100% clear to me that it is unsafe.
Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
confident that nothing bad would happen, and this wasn't one of those
places. It's something of a nested-query environment -- your criterion
#6. How do we know that the surrounding query isn't already parallel?
Perhaps because it must be DML, but I think it must be more important
to support parallel DML than to support this.
I'm not sure what the right thing to do here is, but I think it would
be good if your patch included a test case.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hello, sorry for the late reply.
On 2/14/22 15:33, Robert Haas wrote:
On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
I noticed that referential integrity checks aren't currently
parallelized. Is it on purpose?It's not 100% clear to me that it is safe. But on the other hand, it's
also not 100% clear to me that it is unsafe.Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
confident that nothing bad would happen, and this wasn't one of those
places. It's something of a nested-query environment -- your criterion
#6. How do we know that the surrounding query isn't already parallel?
Perhaps because it must be DML,
Did you mean DDL?
but I think it must be more important
to support parallel DML than to support this.
I'm not sure what the right thing to do here is, but I think it would
be good if your patch included a test case.
Would that be a regression test? (in src/test/regress ?)
If yes, what should I check? Is it good enough to load auto_explain and
check that the query triggered by a foreign key addition is parallelized?
Best regards,
Frédéric
I looked at your patch and it's a good idea to make foreign key validation
use parallel query on large relations.
It would be valuable to add logging to ensure that the ActiveSnapshot and TransactionSnapshot
is the same for the leader and the workers. This logging could be tested in the TAP test.
Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.
Currently the work_mem is set to maintenance_work_mem. This will also require
a doc change to call out.
/*
* Temporarily increase work_mem so that the check query can be executed
* more efficiently. It seems okay to do this because the query is simple
* enough to not use a multiple of work_mem, and one typically would not
* have many large foreign-key validations happening concurrently. So
* this seems to meet the criteria for being considered a "maintenance"
* operation, and accordingly we use maintenance_work_mem. However, we
On 3/19/22 01:57, Imseih (AWS), Sami wrote:
I looked at your patch and it's a good idea to make foreign key validation
use parallel query on large relations.It would be valuable to add logging to ensure that the ActiveSnapshot and TransactionSnapshot
is the same for the leader and the workers. This logging could be tested in the TAP test.Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.Currently the work_mem is set to maintenance_work_mem. This will also require
a doc change to call out./*
* Temporarily increase work_mem so that the check query can be executed
* more efficiently. It seems okay to do this because the query is simple
* enough to not use a multiple of work_mem, and one typically would not
* have many large foreign-key validations happening concurrently. So
* this seems to meet the criteria for being considered a "maintenance"
* operation, and accordingly we use maintenance_work_mem. However, we
Hello Sami,
Thank you for your review!
I will try to do as you say, but it will take time, since my daily job
as database consultant takes most of my time and energy.
Best regards,
Frédéric
On 4/14/22 14:25, Frédéric Yhuel wrote:
On 3/19/22 01:57, Imseih (AWS), Sami wrote:
I looked at your patch and it's a good idea to make foreign key
validation
use parallel query on large relations.It would be valuable to add logging to ensure that the ActiveSnapshot
and TransactionSnapshot
is the same for the leader and the workers. This logging could be
tested in the TAP test.Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.Currently the work_mem is set to maintenance_work_mem. This will also
require
a doc change to call out./*
* Temporarily increase work_mem so that the check query can be
executed
* more efficiently. It seems okay to do this because the query
is simple
* enough to not use a multiple of work_mem, and one typically
would not
* have many large foreign-key validations happening
concurrently. So
* this seems to meet the criteria for being considered a
"maintenance"
* operation, and accordingly we use maintenance_work_mem.
However, weHello Sami,
Thank you for your review!
I will try to do as you say, but it will take time, since my daily job
as database consultant takes most of my time and energy.
Hi,
As suggested by Jacob, here is a quick message to say that I didn't find
enough time to work further on this patch, but I didn't completely
forget it either. I moved it to the next commitfest. Hopefully I will
find enough time and motivation in the coming months :-)
Best regards,
Frédéric
2022年7月26日(火) 20:58 Frédéric Yhuel <frederic.yhuel@dalibo.com>:
On 4/14/22 14:25, Frédéric Yhuel wrote:
On 3/19/22 01:57, Imseih (AWS), Sami wrote:
I looked at your patch and it's a good idea to make foreign key
validation
use parallel query on large relations.It would be valuable to add logging to ensure that the ActiveSnapshot
and TransactionSnapshot
is the same for the leader and the workers. This logging could be
tested in the TAP test.Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.Currently the work_mem is set to maintenance_work_mem. This will also
require
a doc change to call out./*
* Temporarily increase work_mem so that the check query can be
executed
* more efficiently. It seems okay to do this because the query
is simple
* enough to not use a multiple of work_mem, and one typically
would not
* have many large foreign-key validations happening
concurrently. So
* this seems to meet the criteria for being considered a
"maintenance"
* operation, and accordingly we use maintenance_work_mem.
However, weHello Sami,
Thank you for your review!
I will try to do as you say, but it will take time, since my daily job
as database consultant takes most of my time and energy.Hi,
As suggested by Jacob, here is a quick message to say that I didn't find
enough time to work further on this patch, but I didn't completely
forget it either. I moved it to the next commitfest. Hopefully I will
find enough time and motivation in the coming months :-)
Hi Frédéric
This patch has been carried forward for a couple more commitfests since
your message; do you think you'll be able to work on it further during this
release cycle?
Thanks
Ian Barwick
On 12/11/22 06:29, Ian Lawrence Barwick wrote:
2022年7月26日(火) 20:58 Frédéric Yhuel <frederic.yhuel@dalibo.com>:
On 4/14/22 14:25, Frédéric Yhuel wrote:
On 3/19/22 01:57, Imseih (AWS), Sami wrote:
I looked at your patch and it's a good idea to make foreign key
validation
use parallel query on large relations.It would be valuable to add logging to ensure that the ActiveSnapshot
and TransactionSnapshot
is the same for the leader and the workers. This logging could be
tested in the TAP test.Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.Currently the work_mem is set to maintenance_work_mem. This will also
require
a doc change to call out./*
* Temporarily increase work_mem so that the check query can be
executed
* more efficiently. It seems okay to do this because the query
is simple
* enough to not use a multiple of work_mem, and one typically
would not
* have many large foreign-key validations happening
concurrently. So
* this seems to meet the criteria for being considered a
"maintenance"
* operation, and accordingly we use maintenance_work_mem.
However, weHello Sami,
Thank you for your review!
I will try to do as you say, but it will take time, since my daily job
as database consultant takes most of my time and energy.Hi,
As suggested by Jacob, here is a quick message to say that I didn't find
enough time to work further on this patch, but I didn't completely
forget it either. I moved it to the next commitfest. Hopefully I will
find enough time and motivation in the coming months :-)Hi Frédéric
This patch has been carried forward for a couple more commitfests since
your message; do you think you'll be able to work on it further during this
release cycle?
Hi Ian,
I've planned to work on it full time on week 10 (6-10 March), if you
agree to bear with me. The idea would be to bootstrap my brain on it,
and then continue to work on it from time to time.
Best regards,
Frédéric
On Mon, 12 Dec 2022 at 22:06, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 12/11/22 06:29, Ian Lawrence Barwick wrote:
2022年7月26日(火) 20:58 Frédéric Yhuel <frederic.yhuel@dalibo.com>:
On 4/14/22 14:25, Frédéric Yhuel wrote:
On 3/19/22 01:57, Imseih (AWS), Sami wrote:
I looked at your patch and it's a good idea to make foreign key
validation
use parallel query on large relations.It would be valuable to add logging to ensure that the ActiveSnapshot
and TransactionSnapshot
is the same for the leader and the workers. This logging could be
tested in the TAP test.Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.Currently the work_mem is set to maintenance_work_mem. This will also
require
a doc change to call out./*
* Temporarily increase work_mem so that the check query can be
executed
* more efficiently. It seems okay to do this because the query
is simple
* enough to not use a multiple of work_mem, and one typically
would not
* have many large foreign-key validations happening
concurrently. So
* this seems to meet the criteria for being considered a
"maintenance"
* operation, and accordingly we use maintenance_work_mem.
However, weHello Sami,
Thank you for your review!
I will try to do as you say, but it will take time, since my daily job
as database consultant takes most of my time and energy.Hi,
As suggested by Jacob, here is a quick message to say that I didn't find
enough time to work further on this patch, but I didn't completely
forget it either. I moved it to the next commitfest. Hopefully I will
find enough time and motivation in the coming months :-)Hi Frédéric
This patch has been carried forward for a couple more commitfests since
your message; do you think you'll be able to work on it further during this
release cycle?Hi Ian,
I've planned to work on it full time on week 10 (6-10 March), if you
agree to bear with me. The idea would be to bootstrap my brain on it,
and then continue to work on it from time to time.
I have moved this to Mar commitfest as the patch update is expected to
happen during March commitfest.
Regards,
Vignesh
On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
I've planned to work on it full time on week 10 (6-10 March), if you
agree to bear with me. The idea would be to bootstrap my brain on it,
and then continue to work on it from time to time.
Any updates on this patch? Should we move it to next release at this
point? Even if you get time to work on it this commitfest do you think
it's likely to be committable in the next few weeks?
--
Gregory Stark
As Commitfest Manager
On 3/20/23 15:58, Gregory Stark (as CFM) wrote:
On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
I've planned to work on it full time on week 10 (6-10 March), if you
agree to bear with me. The idea would be to bootstrap my brain on it,
and then continue to work on it from time to time.Any updates on this patch?
I had the opportunity to discuss it with Melanie Plageman when she came
to Paris last month and teach us (Dalibo's folk) about the patch review
process.
She advised me to provide a good test case first, and to explain better
how it would be useful, which I'm going to do.
Should we move it to next release at this
point? Even if you get time to work on it this commitfest do you think
it's likely to be committable in the next few weeks?
It is very unlikely. Maybe it's better to remove it from CF and put it
back later if the test case I will provide does a better job at
convincing the Postgres folks that RI checks should be parallelized.
On 20 Mar 2023, at 16:48, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 3/20/23 15:58, Gregory Stark (as CFM) wrote:
Should we move it to next release at this
point? Even if you get time to work on it this commitfest do you think
it's likely to be committable in the next few weeks?It is very unlikely. Maybe it's better to remove it from CF and put it back later if the test case I will provide does a better job at convincing the Postgres folks that RI checks should be parallelized.
As there is no new patch submitted I will go ahead and do that, please feel
free to resubmit when there is renewed interest in working on this.
--
Daniel Gustafsson
On Tue, Jul 4, 2023 at 9:45 AM Daniel Gustafsson <daniel@yesql.se> wrote:
As there is no new patch submitted I will go ahead and do that, please feel
free to resubmit when there is renewed interest in working on this.
Recently I restored a database from a directory format backup and having
this feature would have been quite useful. So, I would like to resume the
work on this patch, unless OP or someone else is already on it.
Regards,
Juan José Santamaría Flecha
On Thu, Aug 10, 2023 at 5:06 PM Juan José Santamaría Flecha <
juanjo.santamaria@gmail.com> wrote:
On Tue, Jul 4, 2023 at 9:45 AM Daniel Gustafsson <daniel@yesql.se> wrote:
As there is no new patch submitted I will go ahead and do that, please
feel
free to resubmit when there is renewed interest in working on this.Recently I restored a database from a directory format backup and having
this feature would have been quite useful. So, I would like to resume the
work on this patch, unless OP or someone else is already on it.
Hearing no one advising me otherwise I'll pick up the work on the patch.
First, I'll try to address all previous comments.
On Mon, Feb 14, 2022 at 3:34 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel <frederic.yhuel@dalibo.com>
wrote:I noticed that referential integrity checks aren't currently
parallelized. Is it on purpose?It's not 100% clear to me that it is safe. But on the other hand, it's
also not 100% clear to me that it is unsafe.Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
confident that nothing bad would happen, and this wasn't one of those
places. It's something of a nested-query environment -- your criterion
#6. How do we know that the surrounding query isn't already parallel?
Perhaps because it must be DML, but I think it must be more important
to support parallel DML than to support this.
Currently I don't see a scenario where the validation might run inside
another parallel query, but I could be missing something. Suggestions on
some edge cases to test are more than welcome.
We already have a case where parallel operations can occur when adding a
table constraint, and that's the index creation for a primary key. Take for
example:
postgres=# SET max_parallel_maintenance_workers TO 4;
postgres=# SET parallel_setup_cost TO 0;
postgres=# SET parallel_tuple_cost TO 0;
postgres=# SET parallel_leader_participation TO 0;
postgres=# SET min_parallel_table_scan_size TO 0;
postgres=# SET client_min_messages TO debug1;
postgres=# CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled
= off);
postgres=# ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
DEBUG: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"parallel_pk_table_pkey" for table "parallel_pk_table"
DEBUG: building index "parallel_pk_table_pkey" on table
"parallel_pk_table" with request for 1 parallel workers
...
DEBUG: index "parallel_pk_table_pkey" can safely use deduplication
DEBUG: verifying table "parallel_pk_table"
This should be better documented. Maybe "parallel index build" [1]https://www.postgresql.org/docs/current/sql-createindex.html could
have its own section, so it can be cited from other sections?
I'm not sure what the right thing to do here is, but I think it would
be good if your patch included a test case.
I have included a basic test for parallel ADD PRIMARY KEY and ADD FOREIGN
KEY.
On Sat, Mar 19, 2022 at 1:57 AM Imseih (AWS), Sami <simseih@amazon.com>
wrote:
It would be valuable to add logging to ensure that the ActiveSnapshot and
TransactionSnapshot
is the same for the leader and the workers. This logging could be tested
in the TAP test.
That machinery is used in any parallel query, I'm not sure this patch
should be responsible for carrying that requirement.
Also, inside RI_Initial_Check you may want to set max_parallel_workers to
max_parallel_maintenance_workers.
Currently the work_mem is set to maintenance_work_mem. This will also
require
a doc change to call out.
Done.
PFA the patch.
[1]: https://www.postgresql.org/docs/current/sql-createindex.html
Regards,
Juan José Santamaría Flecha
Attachments:
v2-0001-Allow-parallel-plan-for-referential-integrity-checks.patchapplication/octet-stream; name=v2-0001-Allow-parallel-plan-for-referential-integrity-checks.patchDownload
From 40df802e38e18c96a4672062f1305405c1641a4a Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha <juanjo.santamaria@gmail.com>
Date: Mon, 14 Aug 2023 11:10:06 -0400
Subject: [PATCH] =?UTF-8?q?Allow=20parallel=20plan=20for=20referential=20i?=
=?UTF-8?q?ntegrity=20checks=20Based=20on=20previous=20work=20from=20Fr?=
=?UTF-8?q?=C3=A9d=C3=A9ric=20Yhuel?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
src/backend/utils/adt/ri_triggers.c | 19 ++++++++++++++++---
src/test/regress/expected/alter_table.out | 10 ++++++++++
src/test/regress/sql/alter_table.sql | 11 +++++++++++
3 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6945d99..b7f92ed 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1383,8 +1383,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
const char *pk_only;
int save_nestlevel;
char workmembuf[32];
+ char maxmntworkers[4];
int spi_result;
SPIPlanPtr qplan;
+ SPIPrepareOptions options;
riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
@@ -1531,6 +1533,8 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
}
}
appendStringInfoChar(&querybuf, ')');
+ elog(DEBUG2, "The RI_Initial_Check() query string built is \"%s\"",
+ querybuf.data);
/*
* Temporarily increase work_mem so that the check query can be executed
@@ -1540,7 +1544,8 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
* this seems to meet the criteria for being considered a "maintenance"
* operation, and accordingly we use maintenance_work_mem. However, we
* must also set hash_mem_multiplier to 1, since it is surely not okay to
- * let that get applied to the maintenance_work_mem value.
+ * let that get applied to the maintenance_work_mem value. In the same
+ * fashion, cap parallel processes by max_parallel_maintenance_workers.
*
* We use the equivalent of a function SET option to allow the setting to
* persist for exactly the duration of the check query. guc.c also takes
@@ -1549,12 +1554,18 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
save_nestlevel = NewGUCNestLevel();
snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
+ /* max_parallel_maintenance_workers <= 1024, so maxmntworkers is char[4] */
+ snprintf(maxmntworkers, sizeof(maxmntworkers), "%d",
+ max_parallel_maintenance_workers);
(void) set_config_option("work_mem", workmembuf,
PGC_USERSET, PGC_S_SESSION,
GUC_ACTION_SAVE, true, 0, false);
(void) set_config_option("hash_mem_multiplier", "1",
PGC_USERSET, PGC_S_SESSION,
GUC_ACTION_SAVE, true, 0, false);
+ (void) set_config_option("max_parallel_workers_per_gather", maxmntworkers,
+ PGC_USERSET, PGC_S_SESSION,
+ GUC_ACTION_SAVE, true, 0, false);
if (SPI_connect() != SPI_OK_CONNECT)
elog(ERROR, "SPI_connect failed");
@@ -1563,10 +1574,12 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
* Generate the plan. We don't need to cache it, and there are no
* arguments to the plan.
*/
- qplan = SPI_prepare(querybuf.data, 0, NULL);
+ memset(&options, 0, sizeof(options));
+ options.cursorOptions |= CURSOR_OPT_PARALLEL_OK;
+ qplan = SPI_prepare_extended(querybuf.data, &options);
if (qplan == NULL)
- elog(ERROR, "SPI_prepare returned %s for %s",
+ elog(ERROR, "SPI_prepare_extended returned %s for %s",
SPI_result_code_string(SPI_result), querybuf.data);
/*
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index cd814ff..0d2a634 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -4658,3 +4658,13 @@ drop publication pub1;
drop schema alter1 cascade;
drop schema alter2 cascade;
NOTICE: drop cascades to table alter2.t1
+-- ALTER TABLE operations that can be parallel
+CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled = off);
+CREATE TABLE parallel_fk_table (a int) WITH (autovacuum_enabled = off);
+SET max_parallel_maintenance_workers TO 4;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SET parallel_leader_participation TO 0;
+SET min_parallel_table_scan_size TO 0;
+ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
+ALTER TABLE parallel_fk_table ADD CONSTRAINT parallel_fk FOREIGN KEY (a) REFERENCES parallel_pk_table (a);
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index ff8c498..6256bdb 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -3066,3 +3066,14 @@ alter table alter1.t1 set schema alter2;
drop publication pub1;
drop schema alter1 cascade;
drop schema alter2 cascade;
+
+-- ALTER TABLE operations that can be parallel
+CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled = off);
+CREATE TABLE parallel_fk_table (a int) WITH (autovacuum_enabled = off);
+SET max_parallel_maintenance_workers TO 4;
+SET parallel_setup_cost TO 0;
+SET parallel_tuple_cost TO 0;
+SET parallel_leader_participation TO 0;
+SET min_parallel_table_scan_size TO 0;
+ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
+ALTER TABLE parallel_fk_table ADD CONSTRAINT parallel_fk FOREIGN KEY (a) REFERENCES parallel_pk_table (a);
--
2.11.0
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and having
this feature would have been quite useful
Hi,
Thanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.
On 8/17/23 09:32, Frédéric Yhuel wrote:
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and
having this feature would have been quite usefulHi,
Thanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.
I have attached two scripts which demonstrate the following problems:
1a. if the tables aren't analyzed nor vacuumed before the post-data
step, then they are index-only scanned, with a lot of heap fetches
(depending on their size, the planner sometimes chooses a seq scan instead).
1b. if the tables have been analyzed but not vacuumed before the
post-data-step, then they are scanned sequentially. Usually better, but
still not so good without a parallel plan.
2. if the visibility maps have been created, then the tables are
index-only scanned without heap fetches, but this can still be slower
than a parallel seq scan.
So it would be nice if pg_restore could vacuum analyze the tables before
the post-data step. I believe it would be faster in most cases.
And it would be nice to allow a parallel plan for RI checks.
Best regards,
Frédéric
On 8/17/23 14:00, Frédéric Yhuel wrote:
On 8/17/23 09:32, Frédéric Yhuel wrote:
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and
having this feature would have been quite usefulHi,
Thanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.I have attached two scripts which demonstrate the following problems:
Let me add the plans for more clarity:
1a. if the tables aren't analyzed nor vacuumed before the post-data
step, then they are index-only scanned, with a lot of heap fetches
(depending on their size, the planner sometimes chooses a seq scan
instead).
https://explain.dalibo.com/plan/7491ga22c5293683#raw
1b. if the tables have been analyzed but not vacuumed before the
post-data-step, then they are scanned sequentially. Usually better, but
still not so good without a parallel plan.
https://explain.dalibo.com/plan/e92c5161g880bdhf#raw
2. if the visibility maps have been created, then the tables are
index-only scanned without heap fetches, but this can still be slower
than a parallel seq scan.
https://explain.dalibo.com/plan/de22bdb4ggc3dffg#raw
And at last, with the patch applied :
https://explain.dalibo.com/plan/54612a09ffh2565a#raw
On Thu, Aug 17, 2023 at 3:51 PM Frédéric Yhuel <frederic.yhuel@dalibo.com>
wrote:
On 8/17/23 14:00, Frédéric Yhuel wrote:
On 8/17/23 09:32, Frédéric Yhuel wrote:
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and
having this feature would have been quite usefulThanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.I have attached two scripts which demonstrate the following problems:
Thanks for the scripts, but I think Robert's concerns come from the safety,
and not the performance, of the parallel operation.
Proving its vulnerability could be easy with a counter example, but
assuring its safety is trickier. What test would suffice to do that?
Regards,
Juan José Santamaría Flecha
On Fri, 18 Aug 2023 at 16:29, Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
On Thu, Aug 17, 2023 at 3:51 PM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 8/17/23 14:00, Frédéric Yhuel wrote:
On 8/17/23 09:32, Frédéric Yhuel wrote:
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and
having this feature would have been quite usefulThanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.I have attached two scripts which demonstrate the following problems:
Thanks for the scripts, but I think Robert's concerns come from the safety, and not the performance, of the parallel operation.
Proving its vulnerability could be easy with a counter example, but assuring its safety is trickier. What test would suffice to do that?
I'm seeing that there has been no activity in this thread for more
than 5 months, I'm planning to close this in the current commitfest
unless someone is planning to take it forward. It can be opened again
when there is more interest.
Regards,
Vignesh
On Sun, 21 Jan 2024 at 07:56, vignesh C <vignesh21@gmail.com> wrote:
On Fri, 18 Aug 2023 at 16:29, Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:On Thu, Aug 17, 2023 at 3:51 PM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 8/17/23 14:00, Frédéric Yhuel wrote:
On 8/17/23 09:32, Frédéric Yhuel wrote:
On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and
having this feature would have been quite usefulThanks for resuming work on this patch. I forgot to mention this in my
original email, but the motivation was also to speed up the restore
process. Parallelizing the FK checks could make a huge difference in
certain cases. We should probably provide such a test case (with perf
numbers), and maybe this is it what Robert asked for.I have attached two scripts which demonstrate the following problems:
Thanks for the scripts, but I think Robert's concerns come from the safety, and not the performance, of the parallel operation.
Proving its vulnerability could be easy with a counter example, but assuring its safety is trickier. What test would suffice to do that?
I'm seeing that there has been no activity in this thread for more
than 5 months, I'm planning to close this in the current commitfest
unless someone is planning to take it forward. It can be opened again
when there is more interest.
Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 5 months, I have changed
the status to RWF. Feel free to add a new CF entry when someone is
planning to resume work more actively.
Regards,
Vignesh