Proposal: speeding up GIN build with parallel workers

Started by Constantin S. Panover 10 years ago30 messageshackers
Jump to latest
#1Constantin S. Pan
kvapen@gmail.com

Hi, Hackers.

The task of building GIN can require lots of time and eats 100 % CPU,
but we could easily make it use more than a 100 %, especially since we
now have parallel workers in postgres.

The process of building GIN looks like this:

1. Accumulate a batch of index records into an rbtree in maintenance
work memory.

2. Dump the batch to disk.

3. Repeat.

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

This speeds up the first step N times, but slows down the second one,
because when multiple workers dump item pointers for the same key, each
of them has to read and decode the results of the previous one. That is
a huge waste, but there is an idea on how to eliminate it.

When it comes to dumping the next batch, a worker does not do it
independently. Instead, it (and every other worker) sends the
accumulated index records to the parent (backend) in ascending key
order. The backend, which receives the records from the workers through
shared memory, can merge them and dump each of them once, without the
need to reread the records N-1 times.

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as is,
though it does not speed up the build process more than 4 times on my
configuration (12 CPUs). There is also a problem with temporary tables,
for which the parallel mode does not work.

Please leave your feedback.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

pgin.patchtext/x-patchDownload+143-51
In reply to: Constantin S. Pan (#1)
Re: Proposal: speeding up GIN build with parallel workers

On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

I am currently working on a patch that allows B-Tree index builds to
be performed in parallel. I think I'm a week or two away from posting
it.

Even without parallelism, wouldn't it be better if GIN indexes were
built using tuplesort? I know way way less about the gin am than the
nbtree am, but I imagine that a prominent cost for GIN index builds is
constructing the main B-Tree (the one that's constructed over key
values) itself. Couldn't tuplesort.c be adapted to cover this case?
That would be much faster in general, particularly with the recent
addition of abbreviated keys, while also leaving a clear path forward
to performing the build in parallel.

I understand that a long term ambition for the gin am is to merge it
with nbtree, to almost automatically benefit from enhancements, and to
reduce the maintenance burden of each.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Constantin S. Pan
kvapen@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Proposal: speeding up GIN build with parallel workers

On Fri, 15 Jan 2016 15:29:51 -0800
Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kvapen@gmail.com>
wrote:
Even without parallelism, wouldn't it be better if GIN indexes were
built using tuplesort? I know way way less about the gin am than the
nbtree am, but I imagine that a prominent cost for GIN index builds is
constructing the main B-Tree (the one that's constructed over key
values) itself. Couldn't tuplesort.c be adapted to cover this case?
That would be much faster in general, particularly with the recent
addition of abbreviated keys, while also leaving a clear path forward
to performing the build in parallel.

While building GIN we need a quick way to update the posting list of
the same key, this is where rbtree comes to rescue. Using tuplesort will
require a completely different approach to building the index: dump
(key, itempointer) pairs into a tuplesort heap, then sort the heap and
merge the itempointers for the same key values.

Both rbtree and sorting require NlogN operations, and abbreviated keys
will not help here, because GIN is used for the case where there are
lots of repeated keys. The benefit of tuplesort is that it would be
better for huge data that does not fit into memory, but on the other
hand it would need twice as much free disk space for sorting as the
data itself took. Are we ready for such cost?

I think we have to experiment with both approaches, and see how it goes.

What are your thoughts?

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Proposal: speeding up GIN build with parallel workers

On Fri, Jan 15, 2016 at 3:29 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

I am currently working on a patch that allows B-Tree index builds to
be performed in parallel. I think I'm a week or two away from posting
it.

Even without parallelism, wouldn't it be better if GIN indexes were
built using tuplesort? I know way way less about the gin am than the
nbtree am, but I imagine that a prominent cost for GIN index builds is
constructing the main B-Tree (the one that's constructed over key
values) itself. Couldn't tuplesort.c be adapted to cover this case?
That would be much faster in general, particularly with the recent
addition of abbreviated keys, while also leaving a clear path forward
to performing the build in parallel.

I think it would take a lot of changes to tuple sort to make this be a
almost-always win.

In the general case each GIN key occurs in many tuples, and the
in-memory rbtree is good at compressing the tid list for each key to
maximize the amount of data that can be in memory (although perhaps it
could be even better, as it doesn't use varbyte encoding on the tid
list the way the posting lists on disk do--it could do so in the bulk
build case, where it receives the tid in order, but not feasibly in
the pending-list clean-up case, where it doesn't get them in order)

When I was testing building an index on a column of text identifiers
where there were a couple million identifiers occurring a few dozen
times each, building it as a gin index (using btree_gin so that plain
text columns could be indexed) was quite a bit faster than building it
as a regular btree index using tuple sort. I didn't really
investigate this difference, but I assume it was due to the better
memory usage leading to less IO.

(Disclaimer: this was on those identifiers which had little difference
in the first 8 bytes, so they didn't really benefit from the
abbreviated keys.)

So in addition to shimming (key, tid) pairs to look like something
that can be fed to tuple sort, I think we would also need to make
tuple sort do on-the fly aggregation when it finds adjacent equal sort
columns, so it can make (key,tid[]) structures rather than (key,tid).
Without that, I think using tuple sort would be a loss at least as
often as it would be a win.

But I do think this with aggregation would be worth it despite the
amount of work involved. In addition to automatically benefiting from
parallel sorts and any other future sort improvements, I think it
would also generate better indexes. I have a theory that a problem
with very large gin indexes is that repeatedly building
maintenance_work_mem worth of rbtree entries and then merging them to
disk yields highly fragmented btrees, in which logical adjacent
key-space does not occupy physically nearby leaf pages, which then can
causes problems either with access that follows a pattern (like range
scans, except gin indexes can't really do those anyway) or further
bulk operations.

So I agree with that a better long term approach would probably be to
make gin index builds (at least the bulk ones, I don't know about the
pending-list-cleanup) to use a tuple sort. But it looks like
Constantin has already done most of the work on his current proposal,
and no one has volunteered to do the work on converting it to tuple
sort instead.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Jeff Janes (#4)
Re: Proposal: speeding up GIN build with parallel workers

On Sun, Jan 17, 2016 at 12:03 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think it would take a lot of changes to tuple sort to make this be a
almost-always win.

In the general case each GIN key occurs in many tuples, and the
in-memory rbtree is good at compressing the tid list for each key to
maximize the amount of data that can be in memory (although perhaps it
could be even better, as it doesn't use varbyte encoding on the tid
list the way the posting lists on disk do--it could do so in the bulk
build case, where it receives the tid in order, but not feasibly in
the pending-list clean-up case, where it doesn't get them in order)

Interesting analysis.

When I was testing building an index on a column of text identifiers
where there were a couple million identifiers occurring a few dozen
times each, building it as a gin index (using btree_gin so that plain
text columns could be indexed) was quite a bit faster than building it
as a regular btree index using tuple sort. I didn't really
investigate this difference, but I assume it was due to the better
memory usage leading to less IO.

(Disclaimer: this was on those identifiers which had little difference
in the first 8 bytes, so they didn't really benefit from the
abbreviated keys.)

Sorry for going on about it, but I think you'd be surprised how
effective abbreviated keys are even in seemingly marginal cases.

So I agree with that a better long term approach would probably be to
make gin index builds (at least the bulk ones, I don't know about the
pending-list-cleanup) to use a tuple sort. But it looks like
Constantin has already done most of the work on his current proposal,
and no one has volunteered to do the work on converting it to tuple
sort instead.

I'm not going to stand in the way of incremental progress,
particularly given that this looks to be a simple patch that doesn't
commit us to anything. I suspect that we should consolidate GIN and
nbtree at some point, though. I think that there are some useful
general consequences for performance that come from consolidation. For
example, my ongoing work on external sorting makes it use much of the
same infrastructure as internal sorting. Now, external sorts
automatically benefit from optimizations to internal sorting, like the
"onlyKey" quicksort optimization.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Constantin S. Pan (#1)
Re: Proposal: speeding up GIN build with parallel workers

On Fri, Jan 15, 2016 at 5:38 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as is,
though it does not speed up the build process more than 4 times on my
configuration (12 CPUs). There is also a problem with temporary tables,
for which the parallel mode does not work.

I have yet to see a case where parallel query, with any current or
pending patch, gets more than about a 4x speedup. I can't think of
any reason that there should be a wall at 4x, and I'm not sure we're
hitting the wall there for the same reason in all cases. But your
observation corresponds to my experience.

I hope we eventually figure out how to make that much better, but I
wouldn't feel too bad about being at that spot now. 4x faster is
still a lot faster.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Constantin S. Pan
kvapen@gmail.com
In reply to: Constantin S. Pan (#1)
Re: [WIP] speeding up GIN build with parallel workers

On Sat, 16 Jan 2016 01:38:39 +0300
"Constantin S. Pan" <kvapen@gmail.com> wrote:

The task of building GIN can require lots of time and eats 100 % CPU,
but we could easily make it use more than a 100 %, especially since we
now have parallel workers in postgres.

The process of building GIN looks like this:

1. Accumulate a batch of index records into an rbtree in maintenance
work memory.

2. Dump the batch to disk.

3. Repeat.

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

This speeds up the first step N times, but slows down the second one,
because when multiple workers dump item pointers for the same key,
each of them has to read and decode the results of the previous one.
That is a huge waste, but there is an idea on how to eliminate it.

When it comes to dumping the next batch, a worker does not do it
independently. Instead, it (and every other worker) sends the
accumulated index records to the parent (backend) in ascending key
order. The backend, which receives the records from the workers
through shared memory, can merge them and dump each of them once,
without the need to reread the records N-1 times.

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as
is, though it does not speed up the build process more than 4 times
on my configuration (12 CPUs). There is also a problem with temporary
tables, for which the parallel mode does not work.

Hey Hackers!

I have made some progress on the proposal (see the attached patch):

0. Moved some repeated code to functions (e.g. ginDumpAccumulator,
ginbuildCommon).

1. Implemented results merging on backend.

2. Disabled the usage of parallel mode when creating index on temporary
tables. No point in using parallel mode for temporary tables anyway,
right?

3. Added GUC parameter to control the number of workers for GIN
building.

4. Hit the 8x speedup limit. Made some analysis of the reasons (see the
attached plot or the data file).

In order to analyze the performance issues, I have made the following:

create table t (k int, v int[]);

create or replace
function randarray(width int, low int, high int)
returns int[] as
$$
select array(select (random()*(high-low) + low)::int
from generate_series(1,width))
$$ language sql;

insert into t select k, randarray(3000, 0, 100000)
from generate_series(1, 100000) as k;

create index t_v_idx on t using gin (v);

This creates 100000 arrays of 3000 random numbers each. The random
numbers are in range [0, 100000]. Then I measure how long the gin
building steps take. There are two steps: scan and merge.

The results show that 'scan' step is sped up perfectly. But the
'merge' step takes longer as you increase the number of workers. The
profiler shows that the bottleneck here is ginMergeItemPointers(), which
I use to merge the results.

Also, I did encounter the problem with workers deadlocking during
heap_open, but that seems to have been resolved by Robert Haas in his
commit regarding group locking.

Please leave your feedback!

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

pgin-performance.pdfapplication/pdfDownload
pgin-performance.datapplication/octet-stream; name=pgin-performance.datDownload
pgin-2.patchtext/x-patchDownload+505-78
#8Oleg Bartunov
oleg@sai.msu.su
In reply to: Constantin S. Pan (#7)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Feb 17, 2016 at 6:55 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

On Sat, 16 Jan 2016 01:38:39 +0300
"Constantin S. Pan" <kvapen@gmail.com> wrote:

The task of building GIN can require lots of time and eats 100 % CPU,
but we could easily make it use more than a 100 %, especially since we
now have parallel workers in postgres.

The process of building GIN looks like this:

1. Accumulate a batch of index records into an rbtree in maintenance
work memory.

2. Dump the batch to disk.

3. Repeat.

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

This speeds up the first step N times, but slows down the second one,
because when multiple workers dump item pointers for the same key,
each of them has to read and decode the results of the previous one.
That is a huge waste, but there is an idea on how to eliminate it.

When it comes to dumping the next batch, a worker does not do it
independently. Instead, it (and every other worker) sends the
accumulated index records to the parent (backend) in ascending key
order. The backend, which receives the records from the workers
through shared memory, can merge them and dump each of them once,
without the need to reread the records N-1 times.

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as
is, though it does not speed up the build process more than 4 times
on my configuration (12 CPUs). There is also a problem with temporary
tables, for which the parallel mode does not work.

Hey Hackers!

I have made some progress on the proposal (see the attached patch):

0. Moved some repeated code to functions (e.g. ginDumpAccumulator,
ginbuildCommon).

1. Implemented results merging on backend.

2. Disabled the usage of parallel mode when creating index on temporary
tables. No point in using parallel mode for temporary tables anyway,
right?

3. Added GUC parameter to control the number of workers for GIN
building.

4. Hit the 8x speedup limit. Made some analysis of the reasons (see the
attached plot or the data file).

In order to analyze the performance issues, I have made the following:

create table t (k int, v int[]);

create or replace
function randarray(width int, low int, high int)
returns int[] as
$$
select array(select (random()*(high-low) + low)::int
from generate_series(1,width))
$$ language sql;

insert into t select k, randarray(3000, 0, 100000)
from generate_series(1, 100000) as k;

create index t_v_idx on t using gin (v);

This creates 100000 arrays of 3000 random numbers each. The random
numbers are in range [0, 100000]. Then I measure how long the gin
building steps take. There are two steps: scan and merge.

The results show that 'scan' step is sped up perfectly. But the
'merge' step takes longer as you increase the number of workers. The
profiler shows that the bottleneck here is ginMergeItemPointers(), which
I use to merge the results.

Also, I did encounter the problem with workers deadlocking during
heap_open, but that seems to have been resolved by Robert Haas in his
commit regarding group locking.

Please leave your feedback!

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG: worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: FATAL: the
database system is in recovery mode
Failed.

Show quoted text

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Constantin S. Pan (#7)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen@gmail.com> wrote:

4. Hit the 8x speedup limit. Made some analysis of the reasons (see the
attached plot or the data file).

Did you actually compare this to the master branch? I wouldn't like to
assume that the one worker case was equivalent. Obviously that's the
really interesting baseline.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Constantin S. Pan
kvapen@gmail.com
In reply to: Peter Geoghegan (#9)
Re: [WIP] speeding up GIN build with parallel workers

I was testing with number of workers starting at 0. The 0 case is (in
theory) equivalent to master branch. But I should certainly compare to
the master branch, you are right. Will do that shortly.

On Wed, 17 Feb 2016 12:26:05 -0800
Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen@gmail.com>
wrote:

4. Hit the 8x speedup limit. Made some analysis of the reasons (see
the attached plot or the data file).

Did you actually compare this to the master branch? I wouldn't like to
assume that the one worker case was equivalent. Obviously that's the
really interesting baseline.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Constantin S. Pan
kvapen@gmail.com
In reply to: Oleg Bartunov (#8)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, 17 Feb 2016 23:01:47 +0300
Oleg Bartunov <obartunov@gmail.com> wrote:

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG: worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault

Fixed this, try the new patch. The bug was in incorrect handling
of some GIN categories.

Attachments:

pgin-3.patchtext/x-patchDownload+520-78
#12David Steele
david@pgmasters.net
In reply to: Constantin S. Pan (#11)
Re: [WIP] speeding up GIN build with parallel workers

On 2/18/16 10:10 AM, Constantin S. Pan wrote:

On Wed, 17 Feb 2016 23:01:47 +0300
Oleg Bartunov <obartunov@gmail.com> wrote:

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG: worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault

Fixed this, try the new patch. The bug was in incorrect handling
of some GIN categories.

Oleg, it looks like Constantin has updated to patch to address the issue
you were seeing. Do you have time to retest and review?

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Constantin S. Pan
kvapen@gmail.com
In reply to: David Steele (#12)
Re: [WIP] speeding up GIN build with parallel workers

On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david@pgmasters.net> wrote:

On 2/18/16 10:10 AM, Constantin S. Pan wrote:

On Wed, 17 Feb 2016 23:01:47 +0300
Oleg Bartunov <obartunov@gmail.com> wrote:

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG: worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault

Fixed this, try the new patch. The bug was in incorrect handling
of some GIN categories.

Oleg, it looks like Constantin has updated to patch to address the
issue you were seeing. Do you have time to retest and review?

Thanks,

Actually, there was some progress since. The patch is
attached.

1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.

2. Changed the way results are merged. It uses shared memory
message queue now.

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256
2 16 126
4 16 89
0 32 247
1 32 270
2 32 123
4 32 92
0 64 254
1 64 272
2 64 123
4 64 88
0 128 250
1 128 263
2 128 126
4 128 85
0 256 247
1 256 269
2 256 130
4 256 88
0 512 257
1 512 275
2 512 129
4 512 92
0 1024 255
1 1024 273
2 1024 130
4 1024 90

On Wed, 17 Feb 2016 12:26:05 -0800
Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen@gmail.com>
wrote:

4. Hit the 8x speedup limit. Made some analysis of the reasons (see
the attached plot or the data file).

Did you actually compare this to the master branch? I wouldn't like to
assume that the one worker case was equivalent. Obviously that's the
really interesting baseline.

Compared with the master branch. The case of 0 workers is
indeed equivalent to the master branch.

Regards,
Constantin

Attachments:

pgin-5.patchtext/x-patchDownload+612-79
#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Constantin S. Pan (#13)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Mar 16, 2016 at 5:41 AM, Constantin S. Pan <kvapen@gmail.com> wrote:

On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david@pgmasters.net> wrote:

On 2/18/16 10:10 AM, Constantin S. Pan wrote:

On Wed, 17 Feb 2016 23:01:47 +0300
Oleg Bartunov <obartunov@gmail.com> wrote:

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG: worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault

Fixed this, try the new patch. The bug was in incorrect handling
of some GIN categories.

Oleg, it looks like Constantin has updated to patch to address the
issue you were seeing. Do you have time to retest and review?

Thanks,

Actually, there was some progress since. The patch is
attached.

1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.

2. Changed the way results are merged. It uses shared memory
message queue now.

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256

It seems from you data that with 1 worker, you are always seeing slowdown,
have you investigated the reason of same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#15Constantin S. Pan
kvapen@gmail.com
In reply to: Amit Kapila (#14)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, 16 Mar 2016 12:14:51 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2016 at 5:41 AM, Constantin S. Pan <kvapen@gmail.com>
wrote:

On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david@pgmasters.net> wrote:

On 2/18/16 10:10 AM, Constantin S. Pan wrote:

On Wed, 17 Feb 2016 23:01:47 +0300
Oleg Bartunov <obartunov@gmail.com> wrote:

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using
gin(body_tsvector); LOG: worker process: parallel worker for
PID 5689 (PID 6906) was terminated by signal 11: Segmentation
fault

Fixed this, try the new patch. The bug was in incorrect handling
of some GIN categories.

Oleg, it looks like Constantin has updated to patch to address the
issue you were seeing. Do you have time to retest and review?

Thanks,

Actually, there was some progress since. The patch is
attached.

1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.

2. Changed the way results are merged. It uses shared memory
message queue now.

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256

It seems from you data that with 1 worker, you are always seeing
slowdown, have you investigated the reason of same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

That slowdown is expected. It slows down because with 1 worker it
has to transfer the results from the worker to the backend.

The backend just waits for the results from the workers and merges them
(in case wnum > 0). So the 1-worker configuration should never be used,
because it is as sequential as the 0-worker, but adds data transfer.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Constantin S. Pan (#15)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Mar 16, 2016 at 2:25 AM, Constantin S. Pan <kvapen@gmail.com> wrote:

The backend just waits for the results from the workers and merges them
(in case wnum > 0). So the 1-worker configuration should never be used,
because it is as sequential as the 0-worker, but adds data transfer.

This is why I wanted an easy way of atomically guaranteeing some
number of workers (typically 2), or not using parallelism at all. I
think the parallel worker API should offer a simple way to do that in
cases like this, where having only 1 worker is never going to win.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Constantin S. Pan
kvapen@gmail.com
In reply to: Peter Geoghegan (#16)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, 16 Mar 2016 02:43:47 -0700
Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Mar 16, 2016 at 2:25 AM, Constantin S. Pan <kvapen@gmail.com>
wrote:

The backend just waits for the results from the workers and merges
them (in case wnum > 0). So the 1-worker configuration should never
be used, because it is as sequential as the 0-worker, but adds data
transfer.

This is why I wanted an easy way of atomically guaranteeing some
number of workers (typically 2), or not using parallelism at all. I
think the parallel worker API should offer a simple way to do that in
cases like this, where having only 1 worker is never going to win.

Well, we can check the number of workers actually launched and revert
back to single backend way when there is less than 2 workers. Let me
code that in.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Constantin S. Pan (#15)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Mar 16, 2016 at 2:55 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

On Wed, 16 Mar 2016 12:14:51 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2016 at 5:41 AM, Constantin S. Pan <kvapen@gmail.com>
wrote:

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256

It seems from you data that with 1 worker, you are always seeing
slowdown, have you investigated the reason of same?

That slowdown is expected. It slows down because with 1 worker it
has to transfer the results from the worker to the backend.

The backend just waits for the results from the workers and merges them
(in case wnum > 0).

Why backend just waits, why can't it does the same work as any worker
does? In general, for other parallelism features the backend also behaves
the same way as worker in producing the results if the results from workers
is not available.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#19Constantin S. Pan
kvapen@gmail.com
In reply to: Amit Kapila (#18)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, 16 Mar 2016 18:08:38 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2016 at 2:55 PM, Constantin S. Pan <kvapen@gmail.com>
wrote:

On Wed, 16 Mar 2016 12:14:51 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2016 at 5:41 AM, Constantin S. Pan
<kvapen@gmail.com> wrote:

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
0 16 247
1 16 256

It seems from you data that with 1 worker, you are always seeing
slowdown, have you investigated the reason of same?

That slowdown is expected. It slows down because with 1 worker it
has to transfer the results from the worker to the backend.

The backend just waits for the results from the workers and merges
them (in case wnum > 0).

Why backend just waits, why can't it does the same work as any worker
does? In general, for other parallelism features the backend also
behaves the same way as worker in producing the results if the
results from workers is not available.

We can make backend do the same work as any worker, but that
will complicate the code for less than 2 % perfomance boost.
This performance issue matters even less as you increase the
number of workers N, since you save only 1/N-th of the
transfer time.

Is backend waiting for workers a really bad practice that
should be avoided at all costs, or can we leave it as is?

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Constantin S. Pan (#19)
Re: [WIP] speeding up GIN build with parallel workers

On Wed, Mar 16, 2016 at 7:50 PM, Constantin S. Pan <kvapen@gmail.com> wrote:

On Wed, 16 Mar 2016 18:08:38 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

Why backend just waits, why can't it does the same work as any worker
does? In general, for other parallelism features the backend also
behaves the same way as worker in producing the results if the
results from workers is not available.

We can make backend do the same work as any worker, but that
will complicate the code for less than 2 % perfomance boost.

Why do you think it will be just 2%? I think for single worker case, it
should be much more as the master backend will be less busy in consuming
tuples from tuple queue. I can't say much about code-complexity, as I
haven't yet looked carefully at the logic of patch, but we didn't find much
difficulty while doing it for parallel scans. One of the commit which
might help you in understanding how currently heap scans are parallelised
is ee7ca559fcf404f9a3bd99da85c8f4ea9fbc2e92, you can see if that can help
you in anyway for writing a generic API for Gin parallel builds.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#21Constantin S. Pan
kvapen@gmail.com
In reply to: Amit Kapila (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Constantin S. Pan (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#22)
#24Dmitry Ivanov
d.ivanov@postgrespro.ru
In reply to: Constantin S. Pan (#13)
#25Constantin S. Pan
kvapen@gmail.com
In reply to: Dmitry Ivanov (#24)
#26Constantin S. Pan
kvapen@gmail.com
In reply to: Dmitry Ivanov (#24)
#27Constantin S. Pan
kvapen@gmail.com
In reply to: Constantin S. Pan (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Constantin S. Pan (#27)
#29Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Janes (#4)
#30Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#29)