Vacuum, Freeze and Analyze: the big picture

Started by Josh Berkusabout 13 years ago44 messageshackers

josh@agliodbs.com

about 13 years ago

Folks,

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
Otherwise, we're liable to repeat the 8.4 problem of making one
operation better (background vacuum) while making another one worse
(freezing).

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

So I've set up a wiki page to document the various problems that force
users to think about vacuum and analyze and try to troubleshoot it:

https://wiki.postgresql.org/wiki/VacuumHeadaches

We can also collect suggested solutions here. I'm looking to create a
long-term development target which removes most of these vacuum
headaches over the next 3 or 4 releases, without making the unremoved
headaches siginficantly worse.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thom Brown

thom@linux.com

about 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

On 30 May 2013 19:48, Josh Berkus <josh@agliodbs.com> wrote:

Folks,

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
Otherwise, we're liable to repeat the 8.4 problem of making one
operation better (background vacuum) while making another one worse
(freezing).

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

So I've set up a wiki page to document the various problems that force
users to think about vacuum and analyze and try to troubleshoot it:

https://wiki.postgresql.org/wiki/VacuumHeadaches

We can also collect suggested solutions here. I'm looking to create a
long-term development target which removes most of these vacuum
headaches over the next 3 or 4 releases, without making the unremoved
headaches siginficantly worse.

Great collection of issues.

I'm not sure I understand this:

"Problem: As of 9.3, there's a significant benefit to vacuum freezing
tables early so that index-only scan is enabled, since freezing also
updates the visibility map. However, with default settings, such
freezing only happens for data which is very old. This means that
index-only scan is less effective than it could be for tables which
have relatively infrequent updates and deletes."

Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
regular VACUUM updated the visibility map too? And why as of 9.3
instead of 9.2?

--
Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

Hi,

On 2013-05-30 11:48:12 -0700, Josh Berkus wrote:

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
Otherwise, we're liable to repeat the 8.4 problem of making one
operation better (background vacuum) while making another one worse
(freezing).

Inhowfar did 8.4 make freezing worse? I can't remember any new problems
there?

I agree that we need to be careful not to make things worse...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

"Problem: As of 9.3, there's a significant benefit to vacuum freezing
tables early so that index-only scan is enabled, since freezing also
updates the visibility map. However, with default settings, such
freezing only happens for data which is very old. This means that
index-only scan is less effective than it could be for tables which
have relatively infrequent updates and deletes."

Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
regular VACUUM updated the visibility map too? And why as of 9.3
instead of 9.2?

As of 9.2, that was a typo.

Allvisible only gets set if there was some reason for VACUUM to visit
the page anyway, no? Which means that an insert-only or insert-mostly
table doesn't get set allvisible until FREEZE. And insert-only tables
are usually very large, and thus really *need* index-only scan.

Hmmm. I should rewrite that item entirely. It has nothing to do with
FREEZE, really.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM658b71073a164abf5679c854f66faed77275e477c1e33aa35a777ae0ceb4b333b91962f17d3dc7842aa8a6e85c7edfcc@asav-1.01.com

Josh Berkus

josh@agliodbs.com

about 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

Inhowfar did 8.4 make freezing worse? I can't remember any new problems
there?

Before the Visibility Map, we always vacuumed all pages in a relation
when it was vacuumed at all. This means that we froze tuples at
vacuum_min_freeze_age, not at autovacuum_max_freeze_age, which is when
we do it post-8.4.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM46f607ac66b2b6de6e9bb1145b5f163264544e81c62b955e6d2f3181eed6965745ce1a77e15736d44212dc8d9754573f@asav-2.01.com

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Josh Berkus (#5)

Re: Vacuum, Freeze and Analyze: the big picture

On 2013-05-30 12:11:23 -0700, Josh Berkus wrote:

Inhowfar did 8.4 make freezing worse? I can't remember any new problems
there?

Before the Visibility Map, we always vacuumed all pages in a relation
when it was vacuumed at all. This means that we froze tuples at
vacuum_min_freeze_age, not at autovacuum_max_freeze_age, which is when
we do it post-8.4.

If we have reason to vacuum the relation we do it at
vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
difference is that the latter triggers a vacuum, while the former only
changes a partial vacuum into a full one.

Calling that behaviour unconditionally worse is, err, interesting...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Thom Brown (#2)

Re: Vacuum, Freeze and Analyze: the big picture

On 2013-05-30 20:01:01 +0100, Thom Brown wrote:

"Problem: As of 9.3, there's a significant benefit to vacuum freezing
tables early so that index-only scan is enabled, since freezing also
updates the visibility map. However, with default settings, such
freezing only happens for data which is very old. This means that
index-only scan is less effective than it could be for tables which
have relatively infrequent updates and deletes."

Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
regular VACUUM updated the visibility map too?

It does. It's after all what it uses to decide which parts of the table
to scan if not doing a full table vacuum.

And why as of 9.3 instead of 9.2?

Mabe because 9.3 updates the vm quicker than earlier version by checking
whether all tuples are visible after we've actually removed the dead
tuples.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

If we have reason to vacuum the relation we do it at
vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
difference is that the latter triggers a vacuum, while the former only
changes a partial vacuum into a full one.

Calling that behaviour unconditionally worse is, err, interesting...

*overall* it's better. But as far as FREEZE itself is concerned, it's
worse.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM5e1e4202f786767a053df09db094189a6376fa610a1e57515dcc730a9d815b93c7a51f32cfaa7b2f36fcb1cd6831568e@asav-2.01.com

Andres Freund

andres@anarazel.de

about 13 years ago

In reply to: Josh Berkus (#8)

Re: Vacuum, Freeze and Analyze: the big picture

On 2013-05-30 12:18:29 -0700, Josh Berkus wrote:

If we have reason to vacuum the relation we do it at
vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
difference is that the latter triggers a vacuum, while the former only
changes a partial vacuum into a full one.

Calling that behaviour unconditionally worse is, err, interesting...

*overall* it's better. But as far as FREEZE itself is concerned, it's
worse.

I am not trying to give you a hard time, but I really can't follow. In
<8.3 we only froze tuples that were older than vacuum_freeze_min_age,
just as today (although the default was higher then than today). 100mio
transactions is long enough that you almost guaranteedly be in a
different checkpoint cycle when freezing than when initially writing the
tuple's buffer. So independent of the time the buffer is frozen (be it
a) we always scan the whole relation, b) we have a partial vacuum
upgraded to a full one due to vacuum_freeze_table_age c) an anti
wraparound vacuum) we will usually write a buffer multiple times.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Simon Riggs

simon@2ndQuadrant.com

almost 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

On 30 May 2013 19:48, Josh Berkus <josh@agliodbs.com> wrote:

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

That is a very commendable approach. We should do that more often.

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

When you say stuff like that, you should add "speculating from my
personal experience". People might get the impression you'd measured
this somehow and it could confuse the issue if you try to assemble a
high level viewpoint and then add in factoids that are just opinions.

We should strive to measure such things.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

I think that's where we already are given that 1000s of users have
quite small databases.

The problem increases with scale. Larger databases have bigger
problems and make it easier to notice things are happening.

I think you should mention that the evidence for these issues is
anecdotal and take careful notes of the backgrounds in which they
occurred. Saying things occur in all cases wouldn't be accurate or
helpful to their resolution.

We should be seeking to contrast this against other databases to see
if we are better or worse than other systems. For example, recording
the moans of someone who is currently managing a 1 TB database, but
yet hasn't ever managed anything else that big is less valuable than a
balanced, experienced viewpoint (if such exists).

Anyway, I support this approach, just wanted to make sure we do it in
sufficient detail to be useful.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Simon Riggs (#10)

Re: Vacuum, Freeze and Analyze: the big picture

On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 30 May 2013 19:48, Josh Berkus <josh@agliodbs.com> wrote:

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

That is a very commendable approach. We should do that more often.

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

When you say stuff like that, you should add "speculating from my
personal experience". People might get the impression you'd measured
this somehow and it could confuse the issue if you try to assemble a
high level viewpoint and then add in factoids that are just opinions.

We should strive to measure such things.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

I think that's where we already are given that 1000s of users have
quite small databases.

The problem increases with scale. Larger databases have bigger
problems and make it easier to notice things are happening.

I think you should mention that the evidence for these issues is
anecdotal and take careful notes of the backgrounds in which they
occurred. Saying things occur in all cases wouldn't be accurate or
helpful to their resolution.

We should be seeking to contrast this against other databases to see
if we are better or worse than other systems. For example, recording
the moans of someone who is currently managing a 1 TB database, but
yet hasn't ever managed anything else that big is less valuable than a
balanced, experienced viewpoint (if such exists).

Anyway, I support this approach, just wanted to make sure we do it in
sufficient detail to be useful.

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

It would be interesting to make a list of what other issues people
have seen using PostgreSQL on very large data sets. Complaints I've
heard include:

1. Inexplicable failure of the planner to use indexes on very large
tables, preferring an obviously-stupid sequential scan. This might be
fixed by the latest index-size fudge factor work.

2. Lack of concurrent DDL.

On VACUUM and ANALYZE specifically, I'd have to say that the most
common problems I encounter are (a) anti-wraparound vacuums kicking in
at inconvenient times and eating up too many resources and (b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources. The changes we've been discussing elsewhere may
not completely solve this problem, because we'll still have to read
all pages that aren't yet all-visible... but they should surely help.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Jeff Janes

jeff.janes@gmail.com

almost 13 years ago

In reply to: Robert Haas (#11)

On Saturday, June 1, 2013, Robert Haas wrote:

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

It would be interesting to make a list of what other issues people
have seen using PostgreSQL on very large data sets. Complaints I've
heard include:

1. Inexplicable failure of the planner to use indexes on very large
tables, preferring an obviously-stupid sequential scan. This might be
fixed by the latest index-size fudge factor work.

2. Lack of concurrent DDL.

On VACUUM and ANALYZE specifically, I'd have to say that the most
common problems I encounter are (a) anti-wraparound vacuums kicking in
at inconvenient times and eating up too many resources and (b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources.

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a
system that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I wonder if there is something simple but currently unknown going on which
is causing it to damage performance out of all proportion to the resources
it ought to be using.

Cheers,

Jeff

#13

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Jeff Janes (#12)

Re: Vacuum, Freeze and Analyze: the big picture

On Sun, Jun 2, 2013 at 2:44 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a system
that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I wonder if there is something simple but currently unknown going on which
is causing it to damage performance out of all proportion to the resources
it ought to be using.

I can't rule that out. Personally, I've always attributed it to the
fact that it's (a) long and (b) I/O-intensive. But it's not
impossible there could also be bugs lurking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Ants Aasma

ants.aasma@cybertec.at

almost 13 years ago

In reply to: Robert Haas (#13)

Re: Vacuum, Freeze and Analyze: the big picture

On Mon, Jun 3, 2013 at 1:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jun 2, 2013 at 2:44 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a system
that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I wonder if there is something simple but currently unknown going on which
is causing it to damage performance out of all proportion to the resources
it ought to be using.

I can't rule that out. Personally, I've always attributed it to the
fact that it's (a) long and (b) I/O-intensive. But it's not
impossible there could also be bugs lurking.

It could be related to the OS. I have no evidence for or against, but
it's possible that OS write-out routines defeat the careful cost based
throttling that PostgreSQL does by periodically dumping a large
portion of dirty pages into the write queue at once. That does nasty
things to query latencies as evidenced by the work on checkpoint
spreading.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Craig Ringer

craig@2ndquadrant.com

almost 13 years ago

In reply to: Robert Haas (#11)

Re: Vacuum, Freeze and Analyze: the big picture

On 06/02/2013 05:56 AM, Robert Haas wrote:

On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 30 May 2013 19:48, Josh Berkus <josh@agliodbs.com> wrote:

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

That is a very commendable approach. We should do that more often.

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

When you say stuff like that, you should add "speculating from my
personal experience". People might get the impression you'd measured
this somehow and it could confuse the issue if you try to assemble a
high level viewpoint and then add in factoids that are just opinions.

We should strive to measure such things.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

I think that's where we already are given that 1000s of users have
quite small databases.

The problem increases with scale. Larger databases have bigger
problems and make it easier to notice things are happening.

I think you should mention that the evidence for these issues is
anecdotal and take careful notes of the backgrounds in which they
occurred. Saying things occur in all cases wouldn't be accurate or
helpful to their resolution.

We should be seeking to contrast this against other databases to see
if we are better or worse than other systems. For example, recording
the moans of someone who is currently managing a 1 TB database, but
yet hasn't ever managed anything else that big is less valuable than a
balanced, experienced viewpoint (if such exists).

Anyway, I support this approach, just wanted to make sure we do it in
sufficient detail to be useful.

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

It would be interesting to make a list of what other issues people
have seen using PostgreSQL on very large data sets. Complaints I've
heard include:

1. Inexplicable failure of the planner to use indexes on very large
tables, preferring an obviously-stupid sequential scan. This might be
fixed by the latest index-size fudge factor work.

I've seen cases on Stack Overflow and elsewhere in which disk merge
sorts perform vastly better than in-memory quicksort, so the user
benefited from greatly *lowering* work_mem.

(b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources.

Some recent experiences I've had have also bought home to me that vacuum
problems are often of the user's own making.

"My database is slow"
->
"This autovacuum thing is using up lots of I/O and CPU, I'll increase
this delay setting here"
->
"My database is slower"
->
"Maybe I didn't solve the autovacuum thing, I'll just turn it off"
->
"My database is barely working"
->
"I'll whack in some manual VACUUM cron jobs during low load maintenance
hours and hope that keeps the worst of the problem away, that's what
random forum posts on the Internet say to do".
-> "oh my, why did my DB just do an emergency shutdown?"

Vacuum being more able to operate in a feedback loop driven by bloat
statistics might be quite valuable, but I'm also wondering if there's
any remotely feasible way to more usefully alert users when they're
having table bloat issues and vacuum isn't coping. Particularly for
cases where autovacuum is working but being impaired by locking.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 13 years ago

In reply to: Craig Ringer (#15)

Re: Vacuum, Freeze and Analyze: the big picture

Craig Ringer <craig@2ndquadrant.com> wrote:

On 06/02/2013 05:56 AM, Robert Haas wrote:

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

I've seen cases on Stack Overflow and elsewhere in which disk merge
sorts perform vastly better than in-memory quicksort, so the user
benefited from greatly *lowering* work_mem.

I have seen this a few times, to. It would be interesting to
characterize the conditions under which this is the case.

(b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources.

Where I hit a nightmare scenario with an anti-wraparound
autovacuum, personally, was after an upgrade using pg_dump piped to
psql. At a high OLTP transaction load time (obviously the most
likely time for it to kick in, because it is triggered by xid
consumption), it started to READ AND REWRITE every heap page of
every table. This overwhelmed the battery-backed write cache,
causing a series of "freezes" for a few minutes at a time, raising
a very large number of end-user complaints. This is when I started
insisting on a VACUUM FREEZE ANALYZE after any bulk load before it
was considered complete and the database brought online for
production use.

Some recent experiences I've had have also bought home to me that vacuum
problems are often of the user's own making.

"My database is slow"
->
"This autovacuum thing is using up lots of I/O and CPU, I'll increase
this delay setting here"
->
"My database is slower"
->
"Maybe I didn't solve the autovacuum thing, I'll just turn it
off"
->
"My database is barely working"
->
"I'll whack in some manual VACUUM cron jobs during low load maintenance
hours and hope that keeps the worst of the problem away, that's what
random forum posts on the Internet say to do".
-> "oh my, why did my DB just do an emergency shutdown?"

Yeah, I've seen exactly that sequence, and some variations on it
quite often. In fact, when I was first using PostgreSQL I got as
far as "Maybe I didn't solve the autovacuum thing" but instead of
"I'll just turn it off" my next step was "I wonder what would
happen if I tried making it *more* aggressive so that it didn't
have so much work to do each time it fired?" Of course, that
vastly improved things. I have found it surprisingly difficult to
convince other people to try that, though.

I have seen people so convinced that vacuum (and particularly
autovacuum) are *so* evil that they turn off autovacuum and monitor
the freeze status of their tables and databases so that they can
run VACUUM "just in time" to prevent the emergency shutdown.
Obviously, this isn't great for their performance. :-(

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Jeff Janes (#12)

Re: Vacuum, Freeze and Analyze: the big picture

On 2013-06-02 11:44:04 -0700, Jeff Janes wrote:

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a
system that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I guess the point is that nobody can actually run a bigger OLTP database
successfully with the default settings. Usually that will end up with a)
huge amounts of bloat in the tables autovac doesn't scan first b) forced
shutdowns because autovac doesn't freeze quickly enough.

The default suggestion that frequently seems to be made is just to
disable autovac cost limitations because of that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Josh Berkus

josh@agliodbs.com

almost 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

Well, at this point, numerically I'd bet that more than 50% of our users
are on AWS, some other cloud, or some kind of iSCSI storage ... some
place where IO sucks. It's How Things Are Done Now.

Speaking for my own clientele, people run into issues, or think they
have issues, with autovacuum at databases as small as 100GB, as long as
they have sufficient write throughput. One really pathological case I
had to troubleshoot was a database which was only 200MB in size! (this
database contained counts of things, and was updated 10,000 times per
second).

Anyway, my goal with that wiki page -- which is on the wiki so that
others can add to it -- is to get all of the common chronic issues on
the table so that we don't inadvertently make one problem worse while
making another one better. Some of the solutions to FREEZE being
bandied around seemed likely to do that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMebd481c461f9a2474d5b05054bc9345d809cf0c3ae081913809d8a325b559ea0c6a146e66e97038438773a6ab80c020e@asav-2.01.com

#19

Josh Berkus

josh@agliodbs.com

almost 13 years ago

In reply to: Josh Berkus (#1)

Re: Vacuum, Freeze and Analyze: the big picture

Jeff,

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a
system that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I wonder if there is something simple but currently unknown going on which
is causing it to damage performance out of all proportion to the resources
it ought to be using.

Does anti-wraparound vacuum (AWAV) write synchronously? If so, there's
a potential whole world of hurt there.

Otherwise, the effect you're seeing is just blowing out various caches:
the CPU cache, storage cache, and filesystem cache. While we can (and
do) prevent vacuum from blowing out shared_buffers, we can't do much
about the others.

Also, locking while it does its work.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM91296288d02de216a6da43e1aa5b7282ed17bffb015bacec54e90cb9a078ff00d92d168070142fa28019661867a73cb4@asav-1.01.com

#20

Robert Haas

robertmhaas@gmail.com

almost 13 years ago

In reply to: Josh Berkus (#19)

Re: Vacuum, Freeze and Analyze: the big picture

On Mon, Jun 3, 2013 at 1:28 PM, Josh Berkus <josh@agliodbs.com> wrote:

Does anti-wraparound vacuum (AWAV) write synchronously? If so, there's
a potential whole world of hurt there.

Not any moreso than anything else ... although it probably does a very
high percentage of FPIs, which might lead to lots of checkpointing.

Also, locking while it does its work.

Eh?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Josh Berkus

josh@agliodbs.com

almost 13 years ago

In reply to: Josh Berkus (#1)

#22

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Josh Berkus (#21)

#23

Josh Berkus