autovacuum can't keep up, bloat just continues to rise

Started by Joshua D. Drakeover 8 years ago25 messageshackers
Jump to latest
#1Joshua D. Drake
jd@commandprompt.com

Hello,

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Machine:

16vCPU
59G Memory
10G SSD (/)
500G SSD /srv/main/9.6 (PGDATA) : 240MB Sustained with 15k IOPS
* Yes, we really got 240MB sustained performance

I used benchmarksql which is a tpc-c benchmark similar to pgbench but
supposedly more thorough.

https://sourceforge.net/projects/benchmarksql/

PostgreSQL 9.6 on Ubuntu 16.04 x64.

postgresql.conf:

max_connections: 1000 (just to keep it out of the way)
shared_buffers: 32G (Awesome work Haas)
work_mem: 32M
maintenance_work_mem: 2G
effective_io_concurrency: 1

* Before anybody suggests increasing this, on GCE over a dozen tests,
anything but disabling this appears to be a performance hit of ~ 10% (I
can reproduce those tests if you like on another thread).

synchronous_commit: off
checkpoint_timeout: 60min
max_wal_size: 5G
random_page_cost: 1
effective_cache_size: 32GB
*this probably should be more like 50 but still
autovacuum_max_workers: 12
* One for each table + a couple for system tables
autovacuum_vacuum_scale_factor: 0.1
autovacuum_cost_delay: 5ms

Here are the benchmarksql settings for all 4 runs:

17:07:54,268 [main] INFO jTPCC : Term-00, warehouses=500
17:07:54,269 [main] INFO jTPCC : Term-00, terminals=128
17:07:54,272 [main] INFO jTPCC : Term-00, runTxnsPerTerminal=100000
17:07:54,273 [main] INFO jTPCC : Term-00, limitTxnsPerMin=300000
17:07:54,273 [main] INFO jTPCC : Term-00, terminalWarehouseFixed=false
17:07:54,274 [main] INFO jTPCC : Term-00,
17:07:54,274 [main] INFO jTPCC : Term-00, newOrderWeight=45
17:07:54,274 [main] INFO jTPCC : Term-00, paymentWeight=43
17:07:54,274 [main] INFO jTPCC : Term-00, orderStatusWeight=4
17:07:54,275 [main] INFO jTPCC : Term-00, deliveryWeight=4
17:07:54,275 [main] INFO jTPCC : Term-00, stockLevelWeight=4

For run 0, I started with:

vacuumdb -U postgres -fz;./runBenchmark.sh my_postgres.properties

And then for each subsequent run, I just ran the benchmark without the
vacuum full so that PostgreSQL could prove us wrong. It didn't. Here is
the break down of the results:

RUN START DISK SIZE END DISK SIZE TPS/Terminal
0 54 78 868.6796875
1 78 91 852.4765625
2 91 103 741.4609375
3 103 116 686.125

The good news is, PostgreSQL is not doing half bad against 128
connections with only 16vCPU. The bad news is we more than doubled our
disk size without getting reuse or bloat under control. The concern here
is that under heavy write loads that are persistent, we will eventually
bloat out and have to vacuum full, no matter what. I know that Jan has
done some testing and the best he could get is something like 8 days
before PostgreSQL became unusable (but don't quote me on that).

I am open to comments, suggestions, running multiple tests with
different parameters or just leaving this in the archive for people to
reference.

Thanks in advance,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Joshua D. Drake (#1)
Re: autovacuum can't keep up, bloat just continues to rise

On Wed, Jul 19, 2017 at 3:11 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

The good news is, PostgreSQL is not doing half bad against 128 connections
with only 16vCPU. The bad news is we more than doubled our disk size without
getting reuse or bloat under control. The concern here is that under heavy
write loads that are persistent, we will eventually bloat out and have to
vacuum full, no matter what. I know that Jan has done some testing and the
best he could get is something like 8 days before PostgreSQL became unusable
(but don't quote me on that).

I am open to comments, suggestions, running multiple tests with different
parameters or just leaving this in the archive for people to reference.

Did you see my blogpost on planet PostgreSQL from last night?

https://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html

Perhaps you could use my query to instrument an interesting index, to
see what that turns up. I would really like to get a better sense of
how often and to what extent index bloat is a problem that VACUUM is
just inherently incapable of keeping under control over time. The
timeline for performance to degrade with real production workloads is
very interesting to me. It's really hard to simulate certain types of
problems that you will see frequently in production.

Index bloat is a general problem that B-Trees have in all other major
systems, but I think that PostgreSQL has a tendency to allow indexes
to become progressively more bloated over time, in a way that it often
can never recover from [1]postgr.es/m/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1NGQ@mail.gmail.com. This may be a particular problem with
unique indexes, where many physical duplicates accumulate in pages.
Duplicates that are theoretically reclaimable, but due to how the
keyspace is split up, will never actually be reclaimed [2]https://wiki.postgresql.org/wiki/Key_normalization#VACUUM_and_nbtree_page_deletion -- Peter Geoghegan.

[1]: postgr.es/m/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1NGQ@mail.gmail.com
[2]: https://wiki.postgresql.org/wiki/Key_normalization#VACUUM_and_nbtree_page_deletion -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#1)
Re: autovacuum can't keep up, bloat just continues to rise

"Joshua D. Drake" <jd@commandprompt.com> writes:

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#2)
Re: autovacuum can't keep up, bloat just continues to rise

Peter Geoghegan wrote:

Index bloat is a general problem that B-Trees have in all other major
systems, but I think that PostgreSQL has a tendency to allow indexes
to become progressively more bloated over time, in a way that it often
can never recover from [1].

Interesting assertion. Many years ago I set to implement btree page
merging[1]/messages/by-id/20020912235429.4714071a.alvherre@atentus.com from ideas in a 1996 paper[2]http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961, though that work never saw the
light of day. Maybe it can be valuable now.

Another thing to consider is indirect indexes, wherein you get less heap
bloat because more updates can become HOT.

If we also allow heap to be pruned from line pointers by walking indexes
to remove specific pointers, instead of requiring indexes to be scanned
whole for the removal, as proposed by Andres, we could get further
along.

[1]: /messages/by-id/20020912235429.4714071a.alvherre@atentus.com
[2]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#3)
Re: autovacuum can't keep up, bloat just continues to rise

On 07/19/2017 03:29 PM, Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

I would try
autovacuum_vacuum_cost_delay = 0
and for any tables > 1 million rows:
autovacuum_vacuum_scale_factor: 0
autovacuum_vacuum_threshold: 100000 (perhaps even smaller)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#6Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#3)
Re: autovacuum can't keep up, bloat just continues to rise

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

Or get rid of the cost delay entirely and let autovacuum actually go as
fast as it can when it's run. The assertion that it can't keep up is
still plausible, but configuring autovacuum to sleep regularly and then
complaining that it's not able to keep up doesn't make sense.

Reducing the nap time might also be helpful if autovacuum is going as
fast as it can and it's able to clear a table in less than a minute.

There have been discussions on this list about parallel vacuum of a
particular table as well; to address this issue I'd encourage reviewing
those discussions and looking at writing a patch to implement that
feature as that would address the case where the table is large enough
that autovacuum simply can't get through all of it before the other
backends have used all space available and then substantially increased
the size of the relation (leading to vacuum on the table running for
longer).

Thanks!

Stephen

In reply to: Alvaro Herrera (#4)
Re: autovacuum can't keep up, bloat just continues to rise

On Wed, Jul 19, 2017 at 3:54 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Index bloat is a general problem that B-Trees have in all other major
systems, but I think that PostgreSQL has a tendency to allow indexes
to become progressively more bloated over time, in a way that it often
can never recover from [1].

Interesting assertion.

I don't pretend to understand the complicated feedback loops that may
exist for workloads that are very reliant on pruning,
kill_prior_tuples cleanup, very aggressive vacuuming, etc. I'll just
say that it seems very reasonable to suppose that they exist.

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

Index bloat may not seem as important as I suggest to some. The ideas
I'm talking about were received somewhat skeptically at pgCon, when I
brought them up informally. Maybe this is a more cogent explanation:
if your indexes were somehow magically never bloated, but the heap
could become just as bloated, then it might not matter that much
simply because the heap pages wouldn't be accessed by index scans.
Heap page accesses may demonstrably be the bottleneck today, without
that magic in place, but perhaps only because indexes point to the
bloat in the first place. It could be a bit like the situation with
bloating UNDO segments in Oracle; those are not the worst thing to
have to bloat. And, the kill_prior_tuples stuff is only as strong as
the weakest link (oldest snapshot).

Many years ago I set to implement btree page
merging[1] from ideas in a 1996 paper[2], though that work never saw the
light of day. Maybe it can be valuable now.

Lots of other database systems don't implement B-Tree page merging,
because it's hard to make work with techniques like Lehman & Yao
B-Link trees, and the average case space utilization still ends up
being good enough. It may be more worthwhile for us, though.
Apparently Kevin has some ideas here.

If I'm right about this index bloat issue, then I'd sooner tackle the
problem by preventing bloat in unique indexes in the fist place, by
using some additional indirection, a topic that I've said plenty about
recently. I think that you can sometimes, though fairly rarely, see
cases that get *really* out of hand, but with things like in-database
queues, that have unusually aggressive update patterns where a great
many duplicates are generated in indexes [1]https://brandur.org/postgres-queues. Our handling of
duplicates in unique indexes [2]https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement -- Peter Geoghegan is surely a problem there.

Another thing to consider is indirect indexes, wherein you get less heap
bloat because more updates can become HOT.

I think that the stuff I'm talking about, about having indirection for
the primary key and making sure unique indexes actually disallow even
physical duplicates actually enables indirect indexes. Remember how I
pointed out issues with unique indexes and VACUUM when you presented
on it at the pgCon unconference? I think that those problems may be
solvable through centralizing everything by making duplicates within
leaf pages physically impossible for unique indexes (not just
logically impossible). One chokepoint for dealing with bloat cleanup
for every index is the entire point of your indirect index design, but
that requires that you actually have no ambiguity about what every
logical pointer (in every secondary index) points to.

If we also allow heap to be pruned from line pointers by walking indexes
to remove specific pointers, instead of requiring indexes to be scanned
whole for the removal, as proposed by Andres, we could get further
along.

Yeah. I talked to Robert about doing that just today. That seems very
worthwhile, no matter how it ends up being used (whether it's by
VACUUM, something closer to an synchronous deletion, or whatever).

[1]: https://brandur.org/postgres-queues
[2]: https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Stephen Frost (#6)
Re: autovacuum can't keep up, bloat just continues to rise

On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

Or get rid of the cost delay entirely and let autovacuum actually go as
fast as it can when it's run. The assertion that it can't keep up is
still plausible, but configuring autovacuum to sleep regularly and then
complaining that it's not able to keep up doesn't make sense.

Reducing the nap time might also be helpful if autovacuum is going as
fast as it can and it's able to clear a table in less than a minute.

There have been discussions on this list about parallel vacuum of a
particular table as well; to address this issue I'd encourage reviewing
those discussions and looking at writing a patch to implement that
feature as that would address the case where the table is large enough
that autovacuum simply can't get through all of it before the other
backends have used all space available and then substantially increased
the size of the relation (leading to vacuum on the table running for
longer).

Yeah, the parallel vacuum of a particular table might help this issue
unless disk I/O is bottle-neck. I'm planning work on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: autovacuum can't keep up, bloat just continues to rise

Peter Geoghegan <pg@bowt.ie> writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state? The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Tom Lane (#9)
Re: autovacuum can't keep up, bloat just continues to rise

On Wed, Jul 19, 2017 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state?

Yes, I believe it does reach a more-or-less steady state. It saturates
when there is a lot of contention, because then you actually can reuse
the bloat. If it didn't saturate, and instead became arbitrarily bad,
then we'd surely have heard about that before now.

The bloat is not entirely wasted, because it actually prevents you
from getting even more bloat in that part of the keyspace.

The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

For the purposes of this discussion, I'm mostly talking about
duplicates within a page on a unique index. If the keyspace owned by
an int4 unique index page only covers 20 distinct values, it will only
ever cover 20 distinct values, now and forever, despite the fact that
there is room for about 400 (a 90/10 split leaves you with 366 items +
1 high key).

I don't know if I should really even call this bloat, since the term
is so overloaded, although this is what other database systems call
index bloat. I like to think of it as "damage to the keyspace",
although that terminology seems unlikely to catch on.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Masahiko Sawada (#8)
Re: autovacuum can't keep up, bloat just continues to rise

On 2017-07-20 05:52, Masahiko Sawada wrote:

On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frost <sfrost@snowman.net>
wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

Or get rid of the cost delay entirely and let autovacuum actually go
as
fast as it can when it's run. The assertion that it can't keep up is
still plausible, but configuring autovacuum to sleep regularly and
then
complaining that it's not able to keep up doesn't make sense.

Reducing the nap time might also be helpful if autovacuum is going as
fast as it can and it's able to clear a table in less than a minute.

There have been discussions on this list about parallel vacuum of a
particular table as well; to address this issue I'd encourage
reviewing
those discussions and looking at writing a patch to implement that
feature as that would address the case where the table is large enough
that autovacuum simply can't get through all of it before the other
backends have used all space available and then substantially
increased
the size of the relation (leading to vacuum on the table running for
longer).

Yeah, the parallel vacuum of a particular table might help this issue
unless disk I/O is bottle-neck. I'm planning work on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

I wrote two days ago about vacuum ring buffer:
/messages/by-id/8737e9bddb82501da1134f021bf4929a@postgrespro.ru

Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces
autovacuum time in 3-10 times.
(for both patched and unpatched version I used single non-default
setting
'autovacuum_cost_delay=2ms').

This is single line change, and it improves things a lot.

With regards,
--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Stephen Frost
sfrost@snowman.net
In reply to: Yura Sokolov (#11)
Re: autovacuum can't keep up, bloat just continues to rise

Greetings,

* Sokolov Yura (y.sokolov@postgrespro.ru) wrote:

I wrote two days ago about vacuum ring buffer:
/messages/by-id/8737e9bddb82501da1134f021bf4929a@postgrespro.ru

Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces
autovacuum time in 3-10 times.
(for both patched and unpatched version I used single non-default
setting
'autovacuum_cost_delay=2ms').

This is single line change, and it improves things a lot.

Right- when the database fits in the OS cache but not in shared_buffers.

I do agree that's a useful improvement to make based on your testing.

It's not clear off-hand how much that would improve this case, as
the database size appears to pretty quickly get beyond the OS memory
size (and only in the first test is the DB starting size less than
system memory to begin with).

Thanks!

Stephen

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Stephen Frost (#12)
Re: autovacuum can't keep up, bloat just continues to rise

On 07/20/2017 06:28 AM, Stephen Frost wrote:

It's not clear off-hand how much that would improve this case, as
the database size appears to pretty quickly get beyond the OS memory
size (and only in the first test is the DB starting size less than
system memory to begin with).

FYI,

I will be posting new numbers in a few hours. I had been planning on
posting this last night but... KDE.

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Claudio Freire
klaussfreire@gmail.com
In reply to: Peter Geoghegan (#10)
Re: autovacuum can't keep up, bloat just continues to rise

On Thu, Jul 20, 2017 at 12:08 AM, Peter Geoghegan <pg@bowt.ie> wrote:

The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

For the purposes of this discussion, I'm mostly talking about
duplicates within a page on a unique index. If the keyspace owned by
an int4 unique index page only covers 20 distinct values, it will only
ever cover 20 distinct values, now and forever, despite the fact that
there is room for about 400 (a 90/10 split leaves you with 366 items +
1 high key).

Microvacuum could also help.

If during a scan you find pointers that point to dead (in vacuum terms)
tuples, the pointers in the index could be deleted. That could be done
during insert into unique indexes before a split, to avoid the split.

Chances are, if there are duplicates, at least a few of them will be dead.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Jeff Janes
jeff.janes@gmail.com
In reply to: Stephen Frost (#12)
Re: autovacuum can't keep up, bloat just continues to rise

On Thu, Jul 20, 2017 at 6:28 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Sokolov Yura (y.sokolov@postgrespro.ru) wrote:

I wrote two days ago about vacuum ring buffer:
/messages/by-id/8737e9bddb82501da1134f021bf492

9a%40postgrespro.ru

Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces
autovacuum time in 3-10 times.
(for both patched and unpatched version I used single non-default
setting
'autovacuum_cost_delay=2ms').

This is single line change, and it improves things a lot.

Right- when the database fits in the OS cache but not in shared_buffers.

On a system with a slow fsync, increasing the ring buffer helps a lot even
if database doesn't fit in the OS cache. When the next buffer allocation
runs into a dirtied buffer in the ring, it needs to sync the WAL up through
that buffer's LSN before it can write it out and reuse it. With a small
ring, this means a lot of WAL flushing needs to be done.

I do agree that's a useful improvement to make based on your testing.

It's not clear off-hand how much that would improve this case, as
the database size appears to pretty quickly get beyond the OS memory
size (and only in the first test is the DB starting size less than
system memory to begin with).

Also, this system probably has a pretty fast fdatasync, considering it is
SSD.

Cheers,

Jeff

In reply to: Claudio Freire (#14)
Re: autovacuum can't keep up, bloat just continues to rise

On Thu, Jul 20, 2017 at 7:45 AM, Claudio Freire <klaussfreire@gmail.com> wrote:

For the purposes of this discussion, I'm mostly talking about
duplicates within a page on a unique index. If the keyspace owned by
an int4 unique index page only covers 20 distinct values, it will only
ever cover 20 distinct values, now and forever, despite the fact that
there is room for about 400 (a 90/10 split leaves you with 366 items +
1 high key).

Microvacuum could also help.

If during a scan you find pointers that point to dead (in vacuum terms)
tuples, the pointers in the index could be deleted. That could be done
during insert into unique indexes before a split, to avoid the split.

Chances are, if there are duplicates, at least a few of them will be dead.

My whole point is that that could easily fail to happen early enough
to prevent a pagesplit that is only needed because there is a short
term surge in the number of duplicate versions that need to be
available for one old snapshot. A pagesplit can be a permanent
solution to a temporary problem. Page deletion can only occur under
tight conditions that are unlikely to *ever* be met in many cases.

Imagine if it was impossible to insert physical duplicates into unique
indexes. In that world, you'd end up bloating some overflow data
structure in UPDATE heavy cases (where HOT doesn't work out). The
bloat wouldn't go on leaf pages, and so you wouldn't get page splits,
and so you wouldn't end up with leaf pages that can only store 20
distinct values now and forever, because that's the range of values
represented by downlinks and the leaf's high key. That's a situation
we actually saw for the leftmost leaf page in Alik's Zipfian
distribution test.

The way that the keyspace is broken up is supposed to be balanced, and
to have long term utility. Working against that to absorb a short term
bloat problem is penny wise, pound foolish.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#9)
Re: autovacuum can't keep up, bloat just continues to rise

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state? The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.

Agreed but we aren't talking about 30% I don't think. Here is where I am
at. It took until 30 minutes ago for the tests to finish:

name | setting
-------------------------------------+-----------
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
autovacuum_work_mem | -1
log_autovacuum_min_duration | -1

Test 1: 55G /srv/main
TPS: 955

Test 2: 112G /srv/main
TPS: 531 (Not sure what happened here, long checkpoint?)

Test 3: 109G /srv/main
TPS: 868

Test 4: 143G
TPS: 840

Test 5: 154G
TPS: 722

I am running the query here:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index

And will post a followup. Once the query finishes I am going to launch
the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything
else you folks would like me to change?

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Joshua D. Drake (#17)
Re: autovacuum can't keep up, bloat just continues to rise

On 21/07/17 15:58, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state? The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.

Agreed but we aren't talking about 30% I don't think. Here is where I
am at. It took until 30 minutes ago for the tests to finish:

name | setting
-------------------------------------+-----------
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
autovacuum_work_mem | -1
log_autovacuum_min_duration | -1

Test 1: 55G /srv/main
TPS: 955

Test 2: 112G /srv/main
TPS: 531 (Not sure what happened here, long checkpoint?)

Test 3: 109G /srv/main
TPS: 868

Test 4: 143G
TPS: 840

Test 5: 154G
TPS: 722

I am running the query here:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index

And will post a followup. Once the query finishes I am going to launch
the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything
else you folks would like me to change?

I usually advise setting autovacuum_naptime = 10s (or even 5s) for
workloads that do a lot of updates (or inserts + deletes) - as on modern
HW a lot of churn can happen in 1 minute, and that just makes vacuum's
job harder.

regards
Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Joshua D. Drake (#17)
Re: autovacuum can't keep up, bloat just continues to rise

On 2017-07-21 06:58, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state? The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something
like
1/3rd free space. You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.

Agreed but we aren't talking about 30% I don't think. Here is where I
am at. It took until 30 minutes ago for the tests to finish:

name | setting
-------------------------------------+-----------
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
autovacuum_work_mem | -1
log_autovacuum_min_duration | -1

Test 1: 55G /srv/main
TPS: 955

Test 2: 112G /srv/main
TPS: 531 (Not sure what happened here, long checkpoint?)

Test 3: 109G /srv/main
TPS: 868

Test 4: 143G
TPS: 840

Test 5: 154G
TPS: 722

I am running the query here:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index

And will post a followup. Once the query finishes I am going to launch
the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything
else you folks would like me to change?

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

Have you measured increased vacuum ring buffer?
This will require recompilation, though.

With regards,
--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Joshua D. Drake
jd@commandprompt.com
In reply to: Yura Sokolov (#19)
Re: autovacuum can't keep up, bloat just continues to rise

On 07/20/2017 11:54 PM, Sokolov Yura wrote:

On 2017-07-21 06:58, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

Have you measured increased vacuum ring buffer?

No, not yet. I think we are still in the proving the problem stage.

JD

This will require recompilation, though.

With regards,

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Joshua D. Drake
jd@commandprompt.com
In reply to: Joshua D. Drake (#17)
#22Joshua D. Drake
jd@commandprompt.com
In reply to: Yura Sokolov (#19)
#23Joshua D. Drake
jd@commandprompt.com
In reply to: Joshua D. Drake (#22)
In reply to: Joshua D. Drake (#23)
In reply to: Peter Geoghegan (#24)