INSERT INTO SELECT, Why Parallelism is not selected?

Started by Dilip Kumaralmost 6 years ago24 messageshackers

dilipbalaut@gmail.com

almost 6 years ago

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query. I have quickly hacked the code to enable
the parallelism for this query. I can see there is a significant
improvement if we can enable the parallelism in this case. For an
experiment, I have just relaxed a couple of checks, maybe if we think
that it's good to enable the parallelism for this case we can try to
put better checks which are specific for this quey.

No parallel:
postgres[36635]=# explain analyze insert into t2 select * from t where a < 100;
Insert on t2 (cost=0.00..29742.00 rows=100 width=105) (actual
time=278.300..278.300 rows=0 loops=1)
-> Seq Scan on t (cost=0.00..29742.00 rows=100 width=105) (actual
time=0.061..277.330 rows=99 loops=1)
Filter: (a < 100)
Rows Removed by Filter: 999901
Planning Time: 0.093 ms
Execution Time: 278.330 ms
(6 rows)

With parallel
Insert on t2 (cost=1000.00..23460.33 rows=100 width=105) (actual
time=108.410..108.410 rows=0 loops=1)
-> Gather (cost=1000.00..23460.33 rows=100 width=105) (actual
time=0.306..108.973 rows=99 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..22450.33 rows=42
width=105) (actual time=66.396..101.979 rows=33 loops=3)
Filter: (a < 100)
Rows Removed by Filter: 333300
Planning Time: 0.154 ms
Execution Time: 110.158 ms
(9 rows)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#1)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query. I have quickly hacked the code to enable
the parallelism for this query. I can see there is a significant
improvement if we can enable the parallelism in this case. For an
experiment, I have just relaxed a couple of checks, maybe if we think
that it's good to enable the parallelism for this case we can try to
put better checks which are specific for this quey.

+1. I also don't see any problem with this idea considering we will
find a better way to enable the parallelism for this case because we
can already use parallelism for statements like "create table
<tbl_name> as select ...". I think we can do more than this by
parallelizing the Insert part of this query as well as we have lifted
group locking restrictions related to RelationExtension and Page lock
in PG13. It would be really cool to do that unless we see any
fundamental problems with it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#2)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query. I have quickly hacked the code to enable
the parallelism for this query. I can see there is a significant
improvement if we can enable the parallelism in this case. For an
experiment, I have just relaxed a couple of checks, maybe if we think
that it's good to enable the parallelism for this case we can try to
put better checks which are specific for this quey.

+1. I also don't see any problem with this idea considering we will
find a better way to enable the parallelism for this case because we
can already use parallelism for statements like "create table
<tbl_name> as select ...".

Okay, thanks for the feedback.

I think we can do more than this by

parallelizing the Insert part of this query as well as we have lifted
group locking restrictions related to RelationExtension and Page lock
in PG13. It would be really cool to do that unless we see any
fundamental problems with it.

+1, I think it will be cool to support for insert part here as well as
insert part in 'Create Table As Select..' as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

rushabh.lathia@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#1)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query. I have quickly hacked the code to enable
the parallelism for this query. I can see there is a significant
improvement if we can enable the parallelism in this case. For an
experiment, I have just relaxed a couple of checks, maybe if we think
that it's good to enable the parallelism for this case we can try to
put better checks which are specific for this quey.

+1 for the idea. For the given example also it shows a good performance
gain and I also don't any reason on restrict the parallel case for INSERT
INTO SELECT.

No parallel:
postgres[36635]=# explain analyze insert into t2 select * from t where a <
100;
Insert on t2 (cost=0.00..29742.00 rows=100 width=105) (actual
time=278.300..278.300 rows=0 loops=1)
-> Seq Scan on t (cost=0.00..29742.00 rows=100 width=105) (actual
time=0.061..277.330 rows=99 loops=1)
Filter: (a < 100)
Rows Removed by Filter: 999901
Planning Time: 0.093 ms
Execution Time: 278.330 ms
(6 rows)

With parallel
Insert on t2 (cost=1000.00..23460.33 rows=100 width=105) (actual
time=108.410..108.410 rows=0 loops=1)
-> Gather (cost=1000.00..23460.33 rows=100 width=105) (actual
time=0.306..108.973 rows=99 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..22450.33 rows=42
width=105) (actual time=66.396..101.979 rows=33 loops=3)
Filter: (a < 100)
Rows Removed by Filter: 333300
Planning Time: 0.154 ms
Execution Time: 110.158 ms
(9 rows)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Rushabh Lathia

robertmhaas@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#1)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query.

There's a relevant comment near the top of heap_prepare_insert().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Robert Haas (#5)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query.

There's a relevant comment near the top of heap_prepare_insert().

I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89
where we have allowed relation extension and page locks to conflict
among group members. We have accordingly changed comments at a few
places but forgot to update this one. I will check and see if any
other similar comments are there which needs to be updated.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#6)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Wed, Jul 15, 2020 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query.

There's a relevant comment near the top of heap_prepare_insert().

I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89
where we have allowed relation extension and page locks to conflict
among group members. We have accordingly changed comments at a few
places but forgot to update this one. I will check and see if any
other similar comments are there which needs to be updated.

The attached patch fixes the comments. Let me know if you think I
have missed anything or any other comments.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#7)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Thu, Jul 16, 2020 at 8:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 15, 2020 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query.

There's a relevant comment near the top of heap_prepare_insert().

I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89
where we have allowed relation extension and page locks to conflict
among group members. We have accordingly changed comments at a few
places but forgot to update this one. I will check and see if any
other similar comments are there which needs to be updated.

The attached patch fixes the comments. Let me know if you think I
have missed anything or any other comments.

Your comments look good to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

robertmhaas@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#7)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The attached patch fixes the comments. Let me know if you think I
have missed anything or any other comments.

If it's safe now, why not remove the error check?

(Is it safe? Could there be other problems?)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Robert Haas (#9)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Thu, Jul 16, 2020 at 6:43 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The attached patch fixes the comments. Let me know if you think I
have missed anything or any other comments.

If it's safe now, why not remove the error check?

I think it is not safe for all kind of Inserts (see my response later
in email), so we need some smarts to identify un-safe inserts before
we can open this check.

(Is it safe? Could there be other problems?)

I think we need to be careful of two things: (a) Do we want to enable
parallel inserts where tuple locks are involved, forex. in statements
like "Insert into primary_tbl Select * from secondary_tbl Where col <
10 For Update;"? In such statements, I don't see any problem because
each worker will operate on a separate page and even if the leader
already has a lock on the tuple, it will be granted to the worker as
it is taken in the same transaction. (b) The insert statements that
can generate 'CommandIds' which can happen while insert into tables
with foreign keys, see below test:

CREATE TABLE primary_tbl(index INTEGER PRIMARY KEY, height real, weight real);
insert into primary_tbl values(1, 1.1, 100);
insert into primary_tbl values(2, 1.2, 100);
insert into primary_tbl values(3, 1.3, 100);

CREATE TABLE secondary_tbl(index INTEGER REFERENCES
primary_tbl(index), height real, weight real);

insert into secondary_tbl values(generate_series(1,3),1.2,200);

Here we can't parallelise statements like "insert into secondary_tbl
values(generate_series(1,3),1.2,200);" as they will generate
'CommandIds' for each row insert into table with foreign key. The new
command id is generated while performing a foreign key check. Now, it
is a separate question whether generating a command id for each row
insert is required or not but as of now we can't parallelize such
statements.

Do you have something else in mind?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#10)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Fri, Jul 17, 2020 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Do you have something else in mind?

I am planning to commit the comments change patch attached in the
above email [1]/messages/by-id/CAA4eK1+RL7c_s=+TwAE6DJ1MmupbEiGCFLt97US+DMm6UxAjTA@mail.gmail.com next week sometime (probably Monday or Tuesday) unless
you have something more to add?

[1]: /messages/by-id/CAA4eK1+RL7c_s=+TwAE6DJ1MmupbEiGCFLt97US+DMm6UxAjTA@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

robertmhaas@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#11)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Fri, Jul 24, 2020 at 7:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 17, 2020 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Do you have something else in mind?

I am planning to commit the comments change patch attached in the
above email [1] next week sometime (probably Monday or Tuesday) unless
you have something more to add?

Well, I think the comments could be more clear - for the insert case
specifically - about which cases you think are and are not safe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#12)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

Robert Haas <robertmhaas@gmail.com> writes:

Well, I think the comments could be more clear - for the insert case
specifically - about which cases you think are and are not safe.

Yeah, the proposed comment changes don't actually add much. Also
please try to avoid inserting non-ASCII   into the source code;
at least in my mail reader, that attachment has some.

regards, tom lane

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Tom Lane (#13)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Well, I think the comments could be more clear - for the insert case
specifically - about which cases you think are and are not safe.

Okay, I'll update the patch accordingly.

Yeah, the proposed comment changes don't actually add much. Also
please try to avoid inserting non-ASCII   into the source code;
at least in my mail reader, that attachment has some.

I don't see any non-ASCII characters in the patch. I have applied and
checked (via vi editor) the patch as well but don't see any non-ASCII
characters. How can I verify that?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Kapila (#14)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

Amit Kapila <amit.kapila16@gmail.com> writes:

On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, the proposed comment changes don't actually add much. Also
please try to avoid inserting non-ASCII   into the source code;
at least in my mail reader, that attachment has some.

I don't see any non-ASCII characters in the patch. I have applied and
checked (via vi editor) the patch as well but don't see any non-ASCII
characters. How can I verify that?

They're definitely there:

$ od -c 0001-Fix-comments-in-heapam.c.patch
...
0002740 h e \n + \t * l e a d e r c
0002760 a n p e r f o r m t h e i
0003000 n s e r t . 302 240 T h i s r e
0003020 s t r i c t i o n c a n b e
0003040 u p l i f t e d o n c e w
0003060 e \n + \t * a l l o w t h e
0003100 302 240 p l a n n e r t o g e n
0003120 e r a t e p a r a l l e l p
0003140 l a n s f o r i n s e r t s
0003160 . \n \t * / \n \t i f ( I s
...

I'm not sure if "git diff --check" would whine about this.

regards, tom lane

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Tom Lane (#15)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sat, Jul 25, 2020 at 8:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, the proposed comment changes don't actually add much. Also
please try to avoid inserting non-ASCII   into the source code;
at least in my mail reader, that attachment has some.

I don't see any non-ASCII characters in the patch. I have applied and
checked (via vi editor) the patch as well but don't see any non-ASCII
characters. How can I verify that?

They're definitely there:

$ od -c 0001-Fix-comments-in-heapam.c.patch
...
0002740 h e \n + \t * l e a d e r c
0002760 a n p e r f o r m t h e i
0003000 n s e r t . 302 240 T h i s r e
0003020 s t r i c t i o n c a n b e
0003040 u p l i f t e d o n c e w
0003060 e \n + \t * a l l o w t h e
0003100 302 240 p l a n n e r t o g e n
0003120 e r a t e p a r a l l e l p
0003140 l a n s f o r i n s e r t s
0003160 . \n \t * / \n \t i f ( I s
...

Thanks, I could see that.

I'm not sure if "git diff --check" would whine about this.

No, "git diff --check" doesn't help. I have tried pgindent but that
also doesn't help neither was I expecting it to help. I am still not
able to figure out how I goofed up this but will spend some more time
on this. In the meantime, I have updated the patch to improve the
comments as suggested by Robert. Do let me know if you want to
edit/add something more?

--
With Regards,
Amit Kapila.

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#16)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sun, Jul 26, 2020 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jul 25, 2020 at 8:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, "git diff --check" doesn't help. I have tried pgindent but that
also doesn't help neither was I expecting it to help. I am still not
able to figure out how I goofed up this but will spend some more time
on this.

I think I have figured out how the patch ended up having &nbsp. Some
editors add it when we use two spaces after a period (.) and I could
see that my Gmail client does that for such a pattern. Normally, I
use one of MSVC, vi, or NetBeans IDE to develop code/patch but
sometimes I check the comments by pasting in Gmail client to find
typos or such and then fix them manually. I guess in this case I have
used Gmail client to write this comment and then copy/pasted it for
the patch. I will be careful not to do this in the future.

--
With Regards,
Amit Kapila.

robertmhaas@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#16)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Sun, Jul 26, 2020 at 7:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

No, "git diff --check" doesn't help. I have tried pgindent but that
also doesn't help neither was I expecting it to help. I am still not
able to figure out how I goofed up this but will spend some more time
on this. In the meantime, I have updated the patch to improve the
comments as suggested by Robert. Do let me know if you want to
edit/add something more?

I still don't agree with this as proposed.

+ * For now, we don't allow parallel inserts of any form not even where the
+ * leader can perform the insert.  This restriction can be uplifted once
+ * we allow the planner to generate parallel plans for inserts.  We can

If I'm understanding this correctly, this logic is completely
backwards. We don't prohibit inserts here because we know the planner
can't generate them. We prohibit inserts here because, if the planner
somehow did generate them, it wouldn't be safe. You're saying that
it's not allowed because we don't try to do it yet, but actually it's
not allowed because we want to make sure that we don't accidentally
try to do it. That's very different.

+ * parallelize inserts unless they generate a new commandid (ex. inserts
+ * into a table having foreign key column) or lock tuples (ex. statements
+ * like Insert .. Select For Update).

I understand the part about generating new command IDs, but not the
part about locking tuples. Why would that be a problem? Can it better
explained here?

Examples in comments are typically introduced with e.g., not ex.

+ * We should be able to parallelize
+ * the later case if we can ensure that no two parallel processes can ever
+ * operate on the same page.

I don't know whether this is talking about two processes operating on
the same page at the same time, or ever within a single query
execution. If it's the former, perhaps we need to explain why that's a
concern for parallel query but not otherwise; if it's the latter, that
seems impossible to guarantee and imagining that we'll ever be able to
do so seems like wishful thinking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Robert Haas (#18)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote:

I still don't agree with this as proposed.
+ * For now, we don't allow parallel inserts of any form not even where the
+ * leader can perform the insert.  This restriction can be uplifted once
+ * we allow the planner to generate parallel plans for inserts.  We can
If I'm understanding this correctly, this logic is completely
backwards. We don't prohibit inserts here because we know the planner
can't generate them. We prohibit inserts here because, if the planner
somehow did generate them, it wouldn't be safe. You're saying that
it's not allowed because we don't try to do it yet, but actually it's
not allowed because we want to make sure that we don't accidentally
try to do it. That's very different.

Right, so how about something like: "To allow parallel inserts, we
need to ensure that they are safe to be performed in workers. We have
the infrastructure to allow parallel inserts in general except for the
case where inserts generate a new commandid (eg. inserts into a table
having a foreign key column)." We can extend this for tuple locking
if required as per the below discussion. Kindly suggest if you prefer
a different wording here.

+ * We should be able to parallelize
+ * the later case if we can ensure that no two parallel processes can ever
+ * operate on the same page.
I don't know whether this is talking about two processes operating on
the same page at the same time, or ever within a single query
execution. If it's the former, perhaps we need to explain why that's a
concern for parallel query but not otherwise;

I am talking about the former case and I know that as per current
design it is not possible that two worker processes try to operate on
the same page but I was trying to be pessimistic so that we can ensure
that via some form of Assert. I don't know whether it is important to
mention this case or not but for the sake of extra safety, I have
mentioned it.

--
With Regards,
Amit Kapila.

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#19)

Re: INSERT INTO SELECT, Why Parallelism is not selected?

On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
I still don't agree with this as proposed.
+ * For now, we don't allow parallel inserts of any form not even where the
+ * leader can perform the insert.  This restriction can be uplifted once
+ * we allow the planner to generate parallel plans for inserts.  We can
If I'm understanding this correctly, this logic is completely
backwards. We don't prohibit inserts here because we know the planner
can't generate them. We prohibit inserts here because, if the planner
somehow did generate them, it wouldn't be safe. You're saying that
it's not allowed because we don't try to do it yet, but actually it's
not allowed because we want to make sure that we don't accidentally
try to do it. That's very different.
Right, so how about something like: "To allow parallel inserts, we
need to ensure that they are safe to be performed in workers. We have
the infrastructure to allow parallel inserts in general except for the
case where inserts generate a new commandid (eg. inserts into a table
having a foreign key column)." We can extend this for tuple locking
if required as per the below discussion. Kindly suggest if you prefer
a different wording here.
+ * We should be able to parallelize
+ * the later case if we can ensure that no two parallel processes can ever
+ * operate on the same page.
I don't know whether this is talking about two processes operating on
the same page at the same time, or ever within a single query
execution. If it's the former, perhaps we need to explain why that's a
concern for parallel query but not otherwise;
I am talking about the former case and I know that as per current
design it is not possible that two worker processes try to operate on
the same page but I was trying to be pessimistic so that we can ensure
that via some form of Assert.

I think the two worker processes can operate on the same page for a
parallel index scan case but it won't be for same tuple. I am not able
to think of any case where we should be worried about tuple locking
for Insert's case, so we can probably skip writing anything about it
unless someone else can think of such a case.

--
With Regards,
Amit Kapila.

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#3)

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Bharath Rupireddy (#21)

amit.kapila16@gmail.com

over 5 years ago

In reply to: Amit Kapila (#20)

amit.kapila16@gmail.com

over 5 years ago

In reply to: Amit Kapila (#23)