Parallel Inserts in CREATE TABLE AS

Started by Bharath Rupireddyover 5 years ago182 messageshackers

bharath.rupireddyforpostgres@gmail.com

over 5 years ago

Hi,

The idea of this patch is to allow the leader and each worker insert the
tuples in parallel if the SELECT part of the CTAS is parallelizable. Along
with the parallel inserts, if the CTAS code path is allowed to do
table_multi_insert()[1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com, then the gain we achieve is as follows:

For a table with 2 integer columns, 100million tuples(more testing results
are at [2]Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement). case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X) case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X) case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)), the exec time on the HEAD is *120sec*, where as with the
parallelism patch proposed here and multi insert patch [1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com, with 3 workers
and leader participation the exec time is *22sec(5.45X)*. With the current
CTAS code which does single tuple insert(see intorel_receive()), the
performance gain is limited to ~1.7X with parallelism. This is due to the
fact that the workers contend more for locks on buffer pages while
extending the table. So, the maximum benefit we could get for CTAS is with
both parallelism and multi tuple inserts.

The design:
Let the planner know that the SELECT is from CTAS in createas.c so that it
can set the number of tuples transferred from the workers to Gather node to
0. With this change, there are chances that the planner may choose the
parallel plan. After the planning, check if the upper plan node is Gather
in createas.c and mark a parallelism flag in the CTAS dest receiver. Pass
the into clause, object id, command id from the leader to workers, so that
each worker can create its own CTAS dest receiver. Leader inserts it's
share of tuples if instructed to do, and so are workers. Each worker writes
atomically it's number of inserted tuples into a shared memory variable,
the leader combines this with it's own number of inserted tuples and shares
to the client.

Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
2. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having some
textual info along with the Gather node?

-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954
width=8)
Filter: (many < 10000)

-----------------------------------------------------------------------------
3. Need to restrict parallel inserts, if CTAS tries to create temp/global
tables as the workers will not have access to those tables. Need to analyze
whether to allow parallelism if CTAS has prepared statements or with no
data.
4. Need to stop unnecessary parallel shared state such as tuple queue being
created and shared to workers.
5. Addition of new test cases. Testing with more scenarios and different
data sets, sizes, tablespaces, select into. Analysis on the 2 mismatches in
write_parallel.sql regression test.

Thoughts?

Credits:
1. Thanks to DIlip Kumar for the main design idea and the discussions.
Thanks to Vignesh for the discussions.
2. Patch development, testing is by me.
3. Thanks to the authors of table_multi_insert() in CTAS patch [1]For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com.

[1]: For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
available at
/messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
[2]: Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement). case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X) case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X) case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
participation,with default postgresql.conf file. All readings are of
triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts -
(0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts -
(0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts -
(0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Bharath Rupireddy (#1)

Re: Parallel Inserts in CREATE TABLE AS

[1] - For table_multi_insert() in CTAS, I used an in-progress patch available at /messages/by-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1=N1AC-Fw@mail.gmail.com
[2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)

I feel this enhancement could give good improvement, +1 for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Bharath Rupireddy (#1)

Re: Parallel Inserts in CREATE TABLE AS

Hi,

On 2020-09-23 17:20:20 +0530, Bharath Rupireddy wrote:

The idea of this patch is to allow the leader and each worker insert the
tuples in parallel if the SELECT part of the CTAS is parallelizable.

Cool!

The design:

I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.

E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table. But how does that work for SELECT INTO? Are you prohibiting
that? ...

Pass the into clause, object id, command id from the leader to
workers, so that each worker can create its own CTAS dest
receiver. Leader inserts it's share of tuples if instructed to do, and
so are workers. Each worker writes atomically it's number of inserted
tuples into a shared memory variable, the leader combines this with
it's own number of inserted tuples and shares to the client.

Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker"
from heap_prepare_insert() for only CTAS cases or for that matter parallel
copy? How about having a variable in any of the worker global contexts and
use that? Of course, we can remove this restriction entirely in case we
fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.

I have mentioned before that I think it'd be good if we changed the
insert APIs to have a more 'scan' like structure. I am thinking of
something like

TableInsertScan* table_begin_insert(Relation);
table_tuple_insert(TableInsertScan *is, other, args);
table_multi_insert(TableInsertScan *is, other, args);
table_end_insert(TableInsertScan *);

that'd then replace the BulkInsertStateData logic we have right now. But
more importantly it'd allow an AM to optimize operations across multiple
inserts, which is important for column stores.

And for the purpose of your question, we could then have a
table_insert_allow_parallel(TableInsertScan *);
or an additional arg to table_begin_insert().

3. Need to restrict parallel inserts, if CTAS tries to create temp/global
tables as the workers will not have access to those tables. Need to analyze
whether to allow parallelism if CTAS has prepared statements or with no
data.

In which case does CTAS not create a table? You definitely need to
ensure that the table is created before your workers are started, and
there needs to be in a different CommandId.

Greetings,

Andres Freund

Parallel Inserts in CREATE TABLE AS

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: