Why is vacuum_freeze_min_age 100m?

Started by Josh Berkusover 16 years ago38 messageshackers
Jump to latest
#1Josh Berkus
josh@agliodbs.com

All,

I've just been tweaking some autovac settings for a large database, and
came to wonder: why does vacuum_max_freeze_age default to such a high
number? What's the logic behind that?

AFAIK, you want max_freeze_age to be the largest possible interval of
XIDs where an existing transaction might still be in scope, but no
larger. Yes?

If that's the case, I'd assert that users who do actually go through
100M XIDs within a transaction window are probably doing some
hand-tuning. And we could lower the default for most users
considerably, such as to 1 million.

Have I missed something?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#2Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
Re: Why is vacuum_freeze_min_age 100m?

On 8/11/09 2:14 PM, Josh Berkus wrote:

All,

I've just been tweaking some autovac settings for a large database, and
came to wonder: why does vacuum_max_freeze_age default to such a high
number? What's the logic behind that?

AFAIK, you want max_freeze_age to be the largest possible interval of
XIDs where an existing transaction might still be in scope, but no
larger. Yes?

If that's the case, I'd assert that users who do actually go through
100M XIDs within a transaction window are probably doing some
hand-tuning. And we could lower the default for most users
considerably, such as to 1 million.

(replying to myself) actually, we don't want to set FrozenXID until the
row is not likely to be modified again. However, for most small-scale
installations (ones where the user has not done any tuning) that's still
likely to be less than 100m transactions.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#2)
Re: Why is vacuum_freeze_min_age 100m?

On Tue, Aug 11, 2009 at 5:23 PM, Josh Berkus<josh@agliodbs.com> wrote:

On 8/11/09 2:14 PM, Josh Berkus wrote:

All,

I've just been tweaking some autovac settings for a large database, and
came to wonder: why does vacuum_max_freeze_age default to such a high
number?  What's the logic behind that?

AFAIK, you want max_freeze_age to be the largest possible interval of
XIDs where an existing transaction might still be in scope, but no
larger.  Yes?

If that's the case, I'd assert that users who do actually go through
100M XIDs within a transaction window are probably doing some
hand-tuning.  And we could lower the default for most users
considerably, such as to 1 million.

(replying to myself) actually, we don't want to set FrozenXID until the
row is not likely to be modified again.  However, for most small-scale
installations (ones where the user has not done any tuning) that's still
likely to be less than 100m transactions.

I don't think that's the name of the parameter, since a Google search
gives zero hits. There are so many fiddly parameters for this thing
that I don't want to speculate about which one you meant.

...Robert

#4Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#3)
Re: Why is vacuum_freeze_min_age 100m?

I don't think that's the name of the parameter, since a Google search
gives zero hits. There are so many fiddly parameters for this thing
that I don't want to speculate about which one you meant.

Sorry, subject line had it correct.

http://www.postgresql.org/docs/8.4/static/runtime-config-client.html#GUC-VACUUM-FREEZE-MIN-AGE

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
Re: Why is vacuum_freeze_min_age 100m?

Josh Berkus <josh@agliodbs.com> writes:

I've just been tweaking some autovac settings for a large database, and
came to wonder: why does vacuum_max_freeze_age default to such a high
number? What's the logic behind that?

(1) not destroying potentially useful forensic evidence too soon;
(2) there's not really much to be gained by reducing it.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#4)
Re: Why is vacuum_freeze_min_age 100m?

On Tue, Aug 11, 2009 at 6:06 PM, Josh Berkus<josh@agliodbs.com> wrote:

I don't think that's the name of the parameter, since a Google search
gives zero hits.  There are so many fiddly parameters for this thing
that I don't want to speculate about which one you meant.

Sorry, subject line had it correct.

http://www.postgresql.org/docs/8.4/static/runtime-config-client.html#GUC-VACUUM-FREEZE-MIN-AGE

Ah. Yeah, I agree with Tom: how would it help to make this smaller?
It seems like that could possibly increase I/O, if the old data is
changing at all, but even if it doesn't it I don't see that it saves
you anything to freeze it sooner. Generally freezing is unnecessary
pain: if we had 128-bit transaction IDs, I'm guessing that we wouldn't
care about freezing or wraparound at all. (Of course that would
create other problems, which is why we don't, but the point is
freezing is at best a necessary evil.)

...Robert

#7Torsten Zühlsdorff
foo@meisterderspiele.de
In reply to: Tom Lane (#5)
Re: Why is vacuum_freeze_min_age 100m?

Tom Lane schrieb:

Josh Berkus <josh@agliodbs.com> writes:

I've just been tweaking some autovac settings for a large database, and
came to wonder: why does vacuum_max_freeze_age default to such a high
number? What's the logic behind that?

(1) not destroying potentially useful forensic evidence too soon;
(2) there's not really much to be gained by reducing it.

If there is not really much to gain by changing the value, why do not
remove the parameter?

Greetings from germany,
Torsten

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#5)
Re: Why is vacuum_freeze_min_age 100m?

Tom Lane <tgl@sss.pgh.pa.us> wrote:

(2) there's not really much to be gained by reducing it.

That depends. The backup techniques I recently posted, using hard
links and rsync, saved us the expense of another ten or twenty TB of
mirrored SAN archival storage space, and expensive WAN bandwidth
upgrades. In piloting this we found that we were sending our
insert-only data over the wire twice -- once after it was inserted and
once after it aged sufficiently to be frozen. Aggressive freezing
effectively cut our bandwidth and storage needs for backup down almost
by half. (Especially after we made sure we left enough time for the
VACUUM FREEZE to complete before starting that night's backup
process.)

Not that most people have the same issue, but there are at least
*some* situations where there is something significant to be gained by
aggressive freezing. Not that this is an argument for changing the
*default*, of course; if someone is going to venture into these backup
techniques, they'd better have the technical savvy to deal with
tweaking their freeze strategy.

-Kevin

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#8)
Re: Why is vacuum_freeze_min_age 100m?

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

(2) there's not really much to be gained by reducing it.

That depends. The backup techniques I recently posted, using hard
links and rsync, saved us the expense of another ten or twenty TB of
mirrored SAN archival storage space, and expensive WAN bandwidth
upgrades. In piloting this we found that we were sending our
insert-only data over the wire twice -- once after it was inserted and
once after it aged sufficiently to be frozen. Aggressive freezing
effectively cut our bandwidth and storage needs for backup down almost
by half. (Especially after we made sure we left enough time for the
VACUUM FREEZE to complete before starting that night's backup
process.)

Hmmm ... if you're using VACUUM FREEZE, its behavior is unaffected by
this GUC anyway --- that option makes it use a freeze age of zero.

regards, tom lane

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#9)
Re: Why is vacuum_freeze_min_age 100m?

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmmm ... if you're using VACUUM FREEZE, its behavior is unaffected
by this GUC anyway --- that option makes it use a freeze age of
zero.

Yeah, I know, but feel like I'm being a bit naughty in using VACUUM
FREEZE -- the documentation says:

| Selects aggressive "freezing" of tuples. Specifying FREEZE is
| equivalent to performing VACUUM with the vacuum_freeze_min_age
| parameter set to zero. The FREEZE option is deprecated and will be
| removed in a future release; set the parameter instead.

So I figure that since it is deprecated, at some point I'll be setting
the vacuum_freeze_min_age option rather than leaving it at the default
and using VACUUM FREEZE in the nightly maintenance run.

-Kevin

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#10)
Re: Why is vacuum_freeze_min_age 100m?

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Yeah, I know, but feel like I'm being a bit naughty in using VACUUM
FREEZE -- the documentation says:

| Selects aggressive "freezing" of tuples. Specifying FREEZE is
| equivalent to performing VACUUM with the vacuum_freeze_min_age
| parameter set to zero. The FREEZE option is deprecated and will be
| removed in a future release; set the parameter instead.

So I figure that since it is deprecated, at some point I'll be setting
the vacuum_freeze_min_age option rather than leaving it at the default
and using VACUUM FREEZE in the nightly maintenance run.

I might be mistaken, but I think the reason we're planning to remove the
option is mainly so we can get rid of FREEZE as a semi-reserved keyword.
The GUC isn't going anywhere.

Anyway, the bottom line is what you said: fooling with this setting
seems like something that's only needed by advanced users.

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: Why is vacuum_freeze_min_age 100m?

On Wed, Aug 12, 2009 at 5:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Yeah, I know, but feel like I'm being a bit naughty in using VACUUM
FREEZE -- the documentation says:

| Selects aggressive "freezing" of tuples. Specifying FREEZE is
| equivalent to performing VACUUM with the vacuum_freeze_min_age
| parameter set to zero. The FREEZE option is deprecated and will be
| removed in a future release; set the parameter instead.

So I figure that since it is deprecated, at some point I'll be setting
the vacuum_freeze_min_age option rather than leaving it at the default
and using VACUUM FREEZE in the nightly maintenance run.

I might be mistaken, but I think the reason we're planning to remove the
option is mainly so we can get rid of FREEZE as a semi-reserved keyword.
The GUC isn't going anywhere.

Anyway, the bottom line is what you said: fooling with this setting
seems like something that's only needed by advanced users.

Someone had the idea a while back of pre-freezing inserted tuples in
the WAL-bypass case.

It seems like in theory you could have a background process that would
iterate through dirty shared buffers and freeze tuples
opportunistically before they are written back to disk, but I'm not
sure that it would really be worth it.

...Robert

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#12)
Re: Why is vacuum_freeze_min_age 100m?

Robert Haas <robertmhaas@gmail.com> wrote:

Someone had the idea a while back of pre-freezing inserted tuples in
the WAL-bypass case.

I'm sure I'm not the one who thought up the idea and first posted
about it, but I'm certainly an advocate for it.

It seems like in theory you could have a background process that
would iterate through dirty shared buffers and freeze tuples
opportunistically before they are written back to disk, but I'm not
sure that it would really be worth it.

We have routinely been doing a database-level VACUUM FREEZE after a
pg_dump | psql copy of a database, because:

(1) Otherwise, users experience abysmal performance running routine
queries as every tuple scanned has its hint bits set during simple
SELECT statements. The massive disk write levels during SELECTs was
very confusing at first, and if you search the archives, I'm sure
you'll find that I'm not the only one who's been confused by it.

(2) Otherwise, there looms a point where every tuple restored, which
is not subsequently updated or deleted, will need to be frozen by
autovacuum -- all at the same time. Unless you're paying
extraordinary attention to the issue, you won't know when it is
coming, but the day will come. Probably in the middle of some
time-critical process which is doing a lot of work.

(3) We want to get this done before starting the WAL archiving, to
prevent having massive quantities of WAL to transmit across the WAN.

(4) With our improved backup processes we have another reason -- our
PITR base backup space requirements and WAN bandwidth usage will be
higher if we don't start from a frozen state and stay frozen.

So really, we'd be pretty silly *not* to make sure that all tuples are
frozen and have hint bits set after a pg_dump | psql copy. It would
speed the process somewhat if the tuples could be written in that
state to start with.

-Kevin

#14Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#6)
Re: Why is vacuum_freeze_min_age 100m?

Robert,

Ah. Yeah, I agree with Tom: how would it help to make this smaller?
It seems like that could possibly increase I/O, if the old data is
changing at all, but even if it doesn't it I don't see that it saves
you anything to freeze it sooner.

Before 8.4, it actually does on tables which are purely cumulative
(WORM). Within a short time, say, 10,000 transactions, the rows to be
frozen are still in the cache. By 100m transactions, they are in an
archive partition which will need to be dragged from disk. So if I know
they won't be altered, then freezing them sooner would be better.

However, I can easily manage this through the autovacuum settings. I
just wanted confirmation of what I was thinking.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#15Jeff Davis
pgsql@j-davis.com
In reply to: Josh Berkus (#1)
freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

[ moving to -hackers ]

If this topic has been discussed previously, please point me to the
earlier threads.

Why aren't we more opportunistic about freezing tuples? For instance, if
we already have a dirty buffer in cache, we should be more aggressive
about freezing those tuples than freezing tuples on disk.

I looked at the code, and it looks like if we freeze one tuple on the
page during VACUUM, we mark it dirty. Wouldn't that be a good
opportunity to freeze all the other tuples on the page that we can?

Or, perhaps when the bgwriter is flushing dirty buffers, it can look for
opportunities to set hint bits or freeze tuples.

Regards,
Jeff Davis

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Davis (#15)
Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

Jeff Davis wrote:

Why aren't we more opportunistic about freezing tuples? For instance, if
we already have a dirty buffer in cache, we should be more aggressive
about freezing those tuples than freezing tuples on disk.

The most widely cited reason is that you lose forensics data. Although
they are increasingly rare, there are still situations in which the heap
tuple machinery messes up and the xmin/xmax/etc fields of the tuple are
the best/only way to find out what happened and thus fix the bug. If
you freeze early, there's just no way to know.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#17Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#16)
[PERFORM] Re: freezing tuples ( was: Why is vacuum_freeze_min_age100m? )

Alvaro Herrera <alvherre@commandprompt.com> wrote:

Jeff Davis wrote:

Why aren't we more opportunistic about freezing tuples? For
instance, if we already have a dirty buffer in cache, we should be
more aggressive about freezing those tuples than freezing tuples on
disk.

The most widely cited reason is that you lose forensics data.
Although they are increasingly rare, there are still situations in
which the heap tuple machinery messes up and the xmin/xmax/etc
fields of the tuple are the best/only way to find out what happened
and thus fix the bug. If you freeze early, there's just no way to
know.

Although I find it hard to believe that this is compelling argument in
the case where an entire table or database is loaded in a single
database transaction.

In the more general case, I'm not sure why this argument applies here
but not to cassert and other diagnostic options. It wouldn't surprise
me to find workloads where writing data three times (once for the
data, once for hint bits, and once to freeze the tid) affects
performance more than cassert.

-Kevin

#18Jeff Davis
pgsql@j-davis.com
In reply to: Alvaro Herrera (#16)
Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

On Thu, 2009-08-13 at 17:58 -0400, Alvaro Herrera wrote:

The most widely cited reason is that you lose forensics data. Although
they are increasingly rare, there are still situations in which the heap
tuple machinery messes up and the xmin/xmax/etc fields of the tuple are
the best/only way to find out what happened and thus fix the bug. If
you freeze early, there's just no way to know.

As it stands, it looks like it's not just one extra write for each
buffer, but potentially many (theoretically, as many as there are tuples
on a page). I suppose the reasoning is that tuples on the same page have
approximately the same xmin, and are likely to be frozen at the same
time. But it seems entirely reasonable that the xmins on one page span
several VACUUM runs, and that seems more likely with the FSM. That means
that a few tuples on the page are older than 100M and get frozen, and
the rest are only about 95M transactions old, so we have to come back
and freeze them again, later.

Let's say that we had a range like 50-100M, where if it's older than
100M, we freeze it, and if it's older than 50M we freeze it only if it's
on a dirty page. We would still have forensic evidence, but we could
make a range such that we avoid writing multiple times.

And people who don't care about forensic evidence can set it to 0-100M.

Regards,
Jeff Davis

#19Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#17)
Re: [PERFORM] Re: freezing tuples ( was: Why is vacuum_freeze_min_age100m? )

On Thu, 2009-08-13 at 17:17 -0500, Kevin Grittner wrote:

It wouldn't surprise
me to find workloads where writing data three times (once for the
data, once for hint bits, and once to freeze the tid)

I'm not sure that we're limited to 3 times, here. I could be missing
something, but if you have tuples with different xmins on the same page,
some might be older than 100M, which you freeze, and then you will have
to come back later to freeze the rest. As far as I can tell, the maximum
number of writes is the number of tuples that fit on the page.

Regards,
Jeff Davis

#20Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#15)
Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

On Thu, Aug 13, 2009 at 5:33 PM, Jeff Davis<pgsql@j-davis.com> wrote:

Or, perhaps when the bgwriter is flushing dirty buffers, it can look for
opportunities to set hint bits or freeze tuples.

One of the tricky things here is that the time you are mostly likely
to want to do this is when you are loading a lot of data. But in that
case shared buffers are likely to be written back to disk before
transaction commit, so it'll be too early to do anything.

...Robert

#21Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#16)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#18)
#23Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#23)
#25Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#22)
#26Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#24)
#27Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#24)
#28Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#20)
#29Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#25)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#26)
#31Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#14)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#27)
#34Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#33)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Davis (#34)
#36Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#35)
#37Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Davis (#36)
#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#27)