Block level parallel vacuum WIP

Started by Masahiko Sawadaover 9 years ago50 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Vacuum Processing Logic
===================

PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.

Basic Design
==========

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

Performance(PoC)
=========

I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,

1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.

Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

Attachments:

0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchtext/plain; charset=US-ASCII; name=0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchDownload+45-22
0002-Block-level-parallel-Vacuum.patchtext/plain; charset=US-ASCII; name=0002-Block-level-parallel-Vacuum.patchDownload+245-67
#2Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Masahiko Sawada (#1)
Re: Block level parallel vacuum WIP

I repeat your test on ProLiant DL580 Gen9 with Xeon E7-8890 v3.

pgbench -s 100 and command vacuum pgbench_acounts after 10_000 transactions:

with: alter system set vacuum_cost_delay to DEFAULT;
parallel_vacuum_workers | time
1 | 138.703,263 ms
2 | 83.751,064 ms
4 | 66.105,861 ms
​ 8 | 59.820,171 ms

with: alter system set vacuum_cost_delay to 1;
parallel_vacuum_workers | time
1 | 127.210,896 ms
2 | 75.300,278 ms
4 | 64.253,087 ms
​ 8 | 60.130,953

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.ru
The Russian Postgres Company

2016-08-23 14:02 GMT+03:00 Masahiko Sawada <sawada.mshk@gmail.com>:

Show quoted text

Hi all,

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Vacuum Processing Logic
===================

PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.

Basic Design
==========

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

Performance(PoC)
=========

I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,

1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.

Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Claudio Freire
klaussfreire@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 8:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Claudio Freire (#3)
Re: Block level parallel vacuum WIP

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#4)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#6Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#1)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alex Ignatov
a.ignatov@postgrespro.ru
In reply to: Michael Paquier (#6)
Re: Block level parallel vacuum WIP

On 23.08.2016 15:41, Michael Paquier wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

Rotating disks is not a problem - you can always raid them and etc. 8k
allocation per relation once per half an hour that is the problem. Seq
scan is this way = random scan...

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#6)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:

1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.

You won't need this if you proceed as above, which is probably a good thing.

And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. Perhaps

VACUUM (PARALLEL 4) relation;

...could mean to vacuum the relation with the given number of workers, and:

VACUUM (PARALLEL) relation;

...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alexander Korotkov (#5)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 9:40 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

Or we can improve vacuum of single index by changing data
representation of dead tuple to bitmap.
It can reduce the number of index whole scan during vacuum and make
comparing the index item to the dead tuples faster.
This is a listed on Todo list and I've implemented it.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: Block level parallel vacuum WIP

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#9)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:

1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.

Thank you for the advice.
That's a what I thought as an another design, I will change the patch
to this design.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.

You won't need this if you proceed as above, which is probably a good thing.

Right.

And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. Perhaps

VACUUM (PARALLEL 4) relation;

...could mean to vacuum the relation with the given number of workers, and:

VACUUM (PARALLEL) relation;

...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.

It looks convenient.
I was thinking that we can manage the number of parallel worker per
table using this parameter for autovacuum , like
ALTER TABLE relation SET (parallel_vacuum_workers = 2)

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#11)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#13)
Re: Block level parallel vacuum WIP

Robert Haas wrote:

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

Sure, just putting the idea out there.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#13)
Re: Block level parallel vacuum WIP

On 2016-08-23 12:17:30 -0400, Robert Haas wrote:

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

Given that index scan(s) are, in my experience, way more often the
bottleneck than the heap-scan(s), I'm not sure that order is the
best. The heap-scan benefits from the VM, the index scans don't.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#8)
Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is proposed.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Michael Paquier (#16)
Re: Block level parallel vacuum WIP

On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--

I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does)
and the same workers to be responsible for the second phase of heap scan.

But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel. That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but
even that will ensure clear demarkation of work and hence no contention on
index blocks.

ISTM this will require further work and it probably makes sense to mark the
patch as "Returned with feedback" for now.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Pavan Deolasee (#17)
Re: Block level parallel vacuum WIP

On Sat, Sep 10, 2016 at 7:44 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--

I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does) and
the same workers to be responsible for the second phase of heap scan.

Yeah, thank you for the feedback.

But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel.
That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but even
that will ensure clear demarkation of work and hence no contention on index
blocks.

I also agree with this idea.
Each worker vacuums different indexes and then the leader process
should update all index statistics after parallel mode exited.

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#18)
Re: Block level parallel vacuum WIP

On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#19)
Re: Block level parallel vacuum WIP

On Thu, Sep 15, 2016 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#20)
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#21)
#23Claudio Freire
klaussfreire@gmail.com
In reply to: Masahiko Sawada (#22)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#22)
#25Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#24)
#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Masahiko Sawada (#25)
#27Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Claudio Freire (#23)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#25)
#29Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#28)
#30Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Simon Riggs (#26)
#31Claudio Freire
klaussfreire@gmail.com
In reply to: Masahiko Sawada (#30)
#32Claudio Freire
klaussfreire@gmail.com
In reply to: Masahiko Sawada (#30)
#33David Steele
david@pgmasters.net
In reply to: Claudio Freire (#32)
#34Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David Steele (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#34)
#36Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#35)
#37David Steele
david@pgmasters.net
In reply to: Masahiko Sawada (#36)
#38Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David Steele (#37)
#39Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#38)
#40Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#39)
#41Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#40)
#42Thomas Munro
thomas.munro@gmail.com
In reply to: Masahiko Sawada (#41)
#43Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Thomas Munro (#42)
#44Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#43)
#45Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#44)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#43)
#47Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#46)
#48Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Thomas Munro (#47)
#49Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#49)