Idea for getting rid of VACUUM FREEZE on cold pages

Started by Josh Berkusover 15 years ago94 messages
#1Josh Berkus
josh@agliodbs.com

From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:

Problem: currently, if your database has a large amount of "cold" data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map. However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages. This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.

Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count. Any page whose wraparound
count was not equal to the current one would be considered to have all
frozen tuples. This would remove the necessity to read and write old
pages just to freeze them, a humongous gain for databases with long data
retention horizons, let alone data warehouses.

All xids on the page would, necessarily, need to belong to the same
wraparound; if a page gets updated and its wraparound count (hereafter
WCID) is lower than current, all tuples on the page would be frozen
before any data is written to it. XIDs which were before the max_freeze
horizon on a page which was being written anyway would be frozen as they
are now.

Obvious issues:

(1) In a case of rows written close to the wraparound point, this would
cause a set of tuples to be frozen sooner than they would be in the
current system.

(2) It's not clear what to do with a page where there are XIDs which are
just before wraparound (like XID # 2.4b) which are still visible and
receives a write with a new cycle xid (#1).

(3) This will require changing the page structure, with all that
entails. So it should probably be done when we're making another change
(like adding CRCs).

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#2Florian Pflug
fgp@phlo.org
In reply to: Josh Berkus (#1)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On May 21, 2010, at 23:57 , Josh Berkus wrote:

From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it:

Problem: currently, if your database has a large amount of "cold" data, such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer needs to touch it thanks to the visibility map. However, every freeze_age transactions, very old pages need to be sucked into memory and rewritten just in order to freeze those pages. This can have a huge impact on system performance, and seems unjustified because the pages are not actually being used.

Suggested resolution: we would add a 4-byte field to the *page* header which would track the XID wraparound count. Any page whose wraparound count was not equal to the current one would be considered to have all frozen tuples. This would remove the necessity to read and write old pages just to freeze them, a humongous gain for databases with long data retention horizons, let alone data warehouses.

If I understand this correctly, VACUUM usually only frees old tuples, but never increases the oldest xid in the pg_class record. Once that value becomes older than freeze_age, VACUUM needs to scan the whole relation to freeze old tuples. That results in most of the pages being marked dirty and subsequently being written out, causing an IO storm. If, OTOH, the wraparound count was stored in the page header, VACUUM would still need to read those pages, but wouldn't need to write them out.

Alternatively, VACUUM could freeze a few pages on each run, even if the xids are below freeze_age. It could pick those pages randomly, or maybe even prefer pages whose tuples have older xmin/xmas values. That would spread the load out more evenly, much like we try to spread checkpoints out over the whole checkpoint interval.

best regards,
Florian Pflugi

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Josh Berkus <josh@agliodbs.com> writes:

From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

regards, tom lane

#4Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#3)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

Yeah, someone pointed that out to me too and suggested that a freeze map
was the better solution. I still think there's something we can do with
pages on the visibility map but I'll have to think about it some more.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#5Jan Wieck
JanWieck@Yahoo.com
In reply to: Josh Berkus (#1)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 5/22/2010 9:16 PM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

Yeah, someone pointed that out to me too and suggested that a freeze map
was the better solution. I still think there's something we can do with
pages on the visibility map but I'll have to think about it some more.

It occurred to me on the flight home that maybe we could salvage
something from this if there were some mechanism that caused hint bits
to get set before the page got written out from shared buffers the first
time. This assumes that you have enough slack in shared-buffer space
that the transactions that touched a particular page all commit or abort
before the page first gets flushed to disk.

At least the background writer should have a few spare cycles to look
over a "to be flushed" page before writing it.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#4)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Josh Berkus <josh@agliodbs.com> writes:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

Yeah, someone pointed that out to me too and suggested that a freeze map
was the better solution. I still think there's something we can do with
pages on the visibility map but I'll have to think about it some more.

It occurred to me on the flight home that maybe we could salvage
something from this if there were some mechanism that caused hint bits
to get set before the page got written out from shared buffers the first
time. This assumes that you have enough slack in shared-buffer space
that the transactions that touched a particular page all commit or abort
before the page first gets flushed to disk.

regards, tom lane

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#3)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 22/05/10 16:35, Tom Lane wrote:

Josh Berkus<josh@agliodbs.com> writes:

From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak.

Me.

Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG.

Hmm, we don't rely on setting hint bits to truncate CLOG anymore
(http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php).
It's the replacement of xids with FrozenXid that matters, the hint bits
are really just hints.

Doesn't change the conclusion, though: you still need to replace XIDs
with FrozenXids to truncate the clog. Conceivably we could keep around
more than 2^32 transactions in clog with this scheme, but then you need
a lot more space for the clog. But perhaps it would be better to do that
than to launch anti-wraparound vacuums, or to refuse more updates in the
extreme cases.

So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

(As also discussed in the Royal Oak) I think we should simply not dirty
a page when a hint bit is updated. Reading a page from disk is
expensive, setting hint bits on the access is generally cheap compared
to that. But that is orthogonal to the idea of a per-page XID epoch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

(As also discussed in the Royal Oak) I think we should simply not dirty
a page when a hint bit is updated. Reading a page from disk is
expensive, setting hint bits on the access is generally cheap compared
to that. But that is orthogonal to the idea of a per-page XID epoch.

I'm not sure it's cheap. What you suggest would result in a substantial
increase in clog accesses, which means (1) more I/O and (2) more
contention. Certainly it's worth experimenting with, but it's no
guaranteed win.

regards, tom lane

#9Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#8)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

I'm not sure it's cheap. What you suggest would result in a substantial
increase in clog accesses, which means (1) more I/O and (2) more
contention. Certainly it's worth experimenting with, but it's no
guaranteed win.

It seems like there's a number of issues we could fix by making the CLOG
more efficient somehow -- from the elimination of hint bits to the
ability to freeze pages without writing them.

Not, of course, that I have any idea how to do that.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Josh Berkus (#1)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:

Problem: currently, if your database has a large amount of "cold" data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map. However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages. This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.

I think this is nonsense. If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*. The problem is vacuum
reading them in so that it can verify there's nothing to do. If we want
to avoid *reading* those pages, this solution is useless:

Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count.

because you still have to read the page.

I think what you're looking for is for this Xid wraparound count to be
stored elsewhere, not inside the page. That way vacuum can read it and
skip the page without reading it altogether. I think a "freeze map" has
been mentioned downthread.

I remember mentioning some time ago that we could declare some tables as
frozen, i.e. "not needing vacuum". This strikes me as similar, except
at the page level rather than table level.

--
Álvaro Herrera <alvherre@alvh.no-ip.org>

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#10)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 24/05/10 22:49, Alvaro Herrera wrote:

Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:

Problem: currently, if your database has a large amount of "cold" data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map. However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages. This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.

I think this is nonsense. If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*. The problem is vacuum
reading them in so that it can verify there's nothing to do. If we want
to avoid *reading* those pages, this solution is useless:

Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count.

because you still have to read the page.

What's missing from the suggestion is that relfrozenxid and datfrozenxid
also need to be expanded to 8-bytes. That way you effectively have
8-byte XIDs, which means that you never need to vacuum to avoid XID
wraparound.

You still need to freeze to truncate clog, though, but if you have the
disk space, you can now do that every 100 billion transactions for
example if you wish.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Heikki Linnakangas (#11)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Heikki Linnakangas's message of mar may 25 04:41:30 -0400 2010:

On 24/05/10 22:49, Alvaro Herrera wrote:

I think this is nonsense. If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*.

What's missing from the suggestion is that relfrozenxid and datfrozenxid
also need to be expanded to 8-bytes. That way you effectively have
8-byte XIDs, which means that you never need to vacuum to avoid XID
wraparound.

Hmm, so are we going to use the "xid epoch" more officially? That's
entirely a new line of development, perhaps it opens new possibilities.

This sounds like extending Xid to 64 bits, without having to store the
high bits everywhere. Was this discussed in the PGCon devs meeting?

--
Álvaro Herrera <alvherre@alvh.no-ip.org>

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

This sounds like extending Xid to 64 bits, without having to store the
high bits everywhere. Was this discussed in the PGCon devs meeting?

Yeah, that's what it would amount to. It was not discussed at the dev
meeting --- it was an idea that came up one evening at PGCon.

I'm not sure whether this would imply having to widen xid to 64 bits
internally. That could be a bit unpleasant as far as CPU and shared
memory space go, although every year that goes by makes 32-bit machines
less interesting as DB servers.

regards, tom lane

#14Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#13)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Alvaro,

This sounds like extending Xid to 64 bits, without having to store the
high bits everywhere. Was this discussed in the PGCon devs meeting?

Essentially, yes.

One of the main objections to raising XID to 64-bit has been the per-row
overhead. But adding 4 bytes per page wouldn't be much of an impact.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#15Jan Wieck
JanWieck@Yahoo.com
In reply to: Heikki Linnakangas (#7)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 5/24/2010 9:30 AM, Heikki Linnakangas wrote:

On 22/05/10 16:35, Tom Lane wrote:

Josh Berkus<josh@agliodbs.com> writes:

From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak.

Me.

Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG.

Hmm, we don't rely on setting hint bits to truncate CLOG anymore
(http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php).
It's the replacement of xids with FrozenXid that matters, the hint bits
are really just hints.

Doesn't change the conclusion, though: you still need to replace XIDs
with FrozenXids to truncate the clog. Conceivably we could keep around
more than 2^32 transactions in clog with this scheme, but then you need
a lot more space for the clog. But perhaps it would be better to do that
than to launch anti-wraparound vacuums, or to refuse more updates in the
extreme cases.

Correct. The problem actually are aborted transactions. Just because an
XID is really old doesn't mean it was committed.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#16Josh Berkus
josh@agliodbs.com
In reply to: Jan Wieck (#15)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Correct. The problem actually are aborted transactions. Just because an
XID is really old doesn't mean it was committed.

Yes, that's the main issue with my idea; XIDs which fell off the CLOG
would become visible even if they'd aborted.

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#16)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 25/05/10 23:56, Josh Berkus wrote:

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

If there's a tuple with an aborted xmin on a page, the bit in the
visibility map is not set. A tuple with aborted xmax doesn't matter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#17)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 5/25/10 10:04 PM, Heikki Linnakangas wrote:

On 25/05/10 23:56, Josh Berkus wrote:

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

If there's a tuple with an aborted xmin on a page, the bit in the
visibility map is not set. A tuple with aborted xmax doesn't matter.

Then it seems like pages in the visibility map, at least, would not need
to be vacuumed or frozen. Do pages persist in the visibility map
indefinitely?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#18)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 26/05/10 21:35, Josh Berkus wrote:

On 5/25/10 10:04 PM, Heikki Linnakangas wrote:

On 25/05/10 23:56, Josh Berkus wrote:

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

If there's a tuple with an aborted xmin on a page, the bit in the
visibility map is not set. A tuple with aborted xmax doesn't matter.

Then it seems like pages in the visibility map, at least, would not need
to be vacuumed or frozen. Do pages persist in the visibility map
indefinitely?

In theory, until any tuple on the page is inserted/updated/deleted
again. However, we've been operating on the assumption that it's always
safe to clear any bit in the visibility map, without affecting
correctness. I would not like to give up that assumption, it makes life
easier.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#19)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

In theory, until any tuple on the page is inserted/updated/deleted
again. However, we've been operating on the assumption that it's always
safe to clear any bit in the visibility map, without affecting
correctness. I would not like to give up that assumption, it makes life
easier.

It wouldn't affect correctness, it would just force that page to be
vacuumed-and-frozen. I think I can make this work, let me just hammer
it out.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#21Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#19)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, May 26, 2010 at 2:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 26/05/10 21:35, Josh Berkus wrote:

On 5/25/10 10:04 PM, Heikki Linnakangas wrote:

On 25/05/10 23:56, Josh Berkus wrote:

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

If there's a tuple with an aborted xmin on a page, the bit in the
visibility map is not set. A tuple with aborted xmax doesn't matter.

Then it seems like pages in the visibility map, at least, would not need
to be vacuumed or frozen.  Do pages persist in the visibility map
indefinitely?

In theory, until any tuple on the page is inserted/updated/deleted again.
However, we've been operating on the assumption that it's always safe to
clear any bit in the visibility map, without affecting correctness. I would
not like to give up that assumption, it makes life easier.

What if we drove it off of the PD_ALL_VISIBLE bit on the page itself,
rather than the visibility map bit? It would be safe to clear the
visibility map bit without touching the page, but if you clear the
PD_ALL_VISIBLE bit on the page itself then you set all the hint bits
and freeze all the tuples. In the case where the visibility map bit
gets cleared but the page-level bit is still set, a future vacuum can
notice and reset the visibility map bit. But whenever the visibility
map bit is set, you know that the page-level bit MUST be set, so you
needn't vacuum those pages, even for anti-wraparound: you know they'll
be frozen when and if they ever get written again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#22Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#21)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

What if we drove it off of the PD_ALL_VISIBLE bit on the page itself,
rather than the visibility map bit? It would be safe to clear the
visibility map bit without touching the page, but if you clear the
PD_ALL_VISIBLE bit on the page itself then you set all the hint bits
and freeze all the tuples. In the case where the visibility map bit
gets cleared but the page-level bit is still set, a future vacuum can
notice and reset the visibility map bit. But whenever the visibility
map bit is set, you know that the page-level bit MUST be set, so you
needn't vacuum those pages, even for anti-wraparound: you know they'll
be frozen when and if they ever get written again.

How does that get us out of reading and writing old pages, though? If
we're going to set a bit on them, we might as well freeze them.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#22)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Josh Berkus <josh@agliodbs.com> writes:

How does that get us out of reading and writing old pages, though?

Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

regards, tom lane

#24Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

How does that get us out of reading and writing old pages, though?

Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high. Consider a table that is
regularly written but append-only. Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

If, however, we decree that you can't write a new tuple into a
PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
still have the small, incremental vacuums but those are pretty cheap,
and in any event, I don't see any way to get rid of them unless
someone can devise a scheme to do away with vacuum entirely. But you
won't need the full-table vacuum to freeze tuples, because you can
freeze them opportunistically the next time those pages are written
(at which point freezing will be very cheap because the page has to be
written to disk at that point no matter what).

Maybe I'm confused.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#24)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. �Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.

Well, that was the use-case that Josh was on about when this idea came
up: high-volume append-only log tables that in most cases will never be
read, so his client wants to get rid of the extra I/O for maintenance
visits to once-written pages.

If you're willing to allow one visit and rewrite of each page, then
we can do that today with maybe a bit of rejiggering of vacuum's
when-to-freeze heuristics.

regards, tom lane

#26Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#25)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, May 26, 2010 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.

Well, that was the use-case that Josh was on about when this idea came
up: high-volume append-only log tables that in most cases will never be
read, so his client wants to get rid of the extra I/O for maintenance
visits to once-written pages.

Well, I'll just note that using PD_ALL_VISIBLE as I'm proposing is
basically equivalent to Josh's original proposal of using an XID epoch
except that it addresses all three of the "obvious issues" which he
noted in his original email; plus it doesn't prevent truncating CLOG
(on the assumption that we rejigger things not to consult clog when
the page is marked PD_ALL_VISIBLE).

If you're willing to allow one visit and rewrite of each page, then
we can do that today with maybe a bit of rejiggering of vacuum's
when-to-freeze heuristics.

Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that
might be just as good, and simpler. Assuming the visibility map is
sufficiently crash-safe/non-buggy, we could then teach VACUUM that
it's OK to advance relfrozenxid even when doing just a partial vacuum
- because any pages that were skipped must contain only frozen tuples.
Previously you've objected to proposals in this direction because
they might destroy forensic information, but maybe we should do it
anyway.

Either way, I think if we do this it *basically* gets rid of
anti-wraparound vacuum. Yeah, we'll still do routine partial vacuums,
but what you won't have is... write the table, vacuum, vacuum, vacuum,
vacuum, OK, everything's visible to everyone, don't need to vacuum any
more... months pass... boom, unexpected full-table vacuum. The
beginning part is the same, but you get rid of the boom at the end.
The only way I see to cut down vacuum activity even further is to
freeze them (and set the visibility map bit) before evicting them from
shared_buffers. That's really the only way to get "write once and
only once", but it's pretty hit or miss, because the xmin horizon
might not advance fast enough to make it actually work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#27Jesper Krogh
jesper@krogh.cc
In reply to: Robert Haas (#24)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 27/05/2010, at 02.48, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

How does that get us out of reading and writing old pages, though?

Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to
solve
your problem, because they cannot become set without having visited
the
page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high. Consider a table that is
regularly written but append-only. Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

Just a thought. Wouldn't a All-visible bit also enable index only
scans to some degree?

Jesper

#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jesper Krogh (#27)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 27/05/10 08:56, Jesper Krogh wrote:

Just a thought. Wouldn't a All-visible bit also enable index only scans
to some degree?

Yes. In fact, that's one reason I implemented the visibility map in the
first place. I started working on index-only scans based on that last
year, if you search the archives for index-only scans you'll find those
discussions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#29Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#24)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.

That is the use case, though. What I've encountered so far at 3 client
sites is tables which are largely append-only, with a few selects and
very few updates (< 2%) on recent data. In general, once data gets
flushed out of memory, it goes to disk and never gets recalled, and
certainly not written. Thinks are hunky-dory until we reach
max_freeze_age, at which point the server has to chew through hundreds
of gigabytes of old data just to freeze them, sometimes bringing the
application to a halt in the process.

The user's perspective on this is quite reasonable: if I haven't
selected these pages, and I haven't written to them, why does autovacuum
need to visit them and screw up my server performance?

Consider a table that is
regularly written but append-only. Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

If, however, we decree that you can't write a new tuple into a
PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
still have the small, incremental vacuums but those are pretty cheap,

That only works if those pages were going to be autovacuumed anyway. In
the case outlined above (which I've seen at 3 different production sites
this year), they wouldn't be; a table with less than 2% updates and
deletes does not get vacuumed until max_freeze_age for any reason. For
that matter, pages which are getting autovacuumed are not a problem,
period; they're being read and written and freezing them is not an issue.

I'm not seeing a way of fixing this common issue short of overhauling
CLOG, or of creating a freeze_map. Darn.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#30Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#26)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 5/26/10 6:32 PM, Robert Haas wrote:

Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that
might be just as good, and simpler. Assuming the visibility map is
sufficiently crash-safe/non-buggy, we could then teach VACUUM that
it's OK to advance relfrozenxid even when doing just a partial vacuum
- because any pages that were skipped must contain only frozen tuples.
Previously you've objected to proposals in this direction because
they might destroy forensic information, but maybe we should do it
anyway.

It would be an improvement, and easier than the various ways of never
having to visit the pages, which are all fairly intensive. Given the
destruction of rollback information, though, we'd probably want a way to
switch this behaviour on and off as an autovacuum setting.

Does this send us down the wrong path, though? I thought we wanted to
think about removing hint bits so that we could implement things like
CRCs. No?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#31Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#30)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Thu, May 27, 2010 at 2:17 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 5/26/10 6:32 PM, Robert Haas wrote:

Hmm, yeah.  Maybe we should freeze when we set PD_ALL_VISIBLE; that
might be just as good, and simpler.  Assuming the visibility map is
sufficiently crash-safe/non-buggy, we could then teach VACUUM that
it's OK to advance relfrozenxid even when doing just a partial vacuum
- because any pages that were skipped must contain only frozen tuples.
 Previously you've objected to proposals in this direction because
they might destroy forensic information, but maybe we should do it
anyway.

It would be an improvement, and easier than the various ways of never
having to visit the pages, which are all fairly intensive.  Given the
destruction of rollback information, though, we'd probably want a way to
switch this behaviour on and off as an autovacuum setting.

It's not going to destroy anything that is needed for rollback unless
there's a bug - PD_ALL_VISIBLE only gets set when all tuples on the
page are visible to all backends. That can't happen until all
transactions that wrote the page, and all others that have a lower
xmin, have committed. That having been said, if making it a GUC makes
people less nervous about doing it, then +1 from me.

Does this send us down the wrong path, though?  I thought we wanted to
think about removing hint bits so that we could implement things like
CRCs.  No?

PD_ALL_VISIBLE is a page-level bit, not a tuple-level bit, and I
strongly suspect it's not going anywhere. It's critical
infrastructure for index-only scans, among other things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#32Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#29)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Thu, May 27, 2010 at 2:00 PM, Josh Berkus <josh@agliodbs.com> wrote:

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.

That is the use case, though.  What I've encountered so far at 3 client
sites is tables which are largely append-only, with a few selects and
very few updates (< 2%) on recent data.   In general, once data gets
flushed out of memory, it goes to disk and never gets recalled, and
certainly not written.

We might be able to optimize this case if the transactions are small,
such that they commit before dirtying too large a fraction of
shared_buffers. We could - at least in theory - teach the bgwriter or
some other process to freeze them before writing them to disk the
first time. But if the blocks have to be written to disk before
transaction commit it seems to me we're DOA, unless we're willing to
retain arbitrarily large amounts of CLOG.

What might be more practical is to try to find ways to spread out the
I/O so that it doesn't happen all at once in a huge ornery spike.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#33Jesper Krogh
jesper@krogh.cc
In reply to: Josh Berkus (#29)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 27/05/2010, at 20.00, Josh Berkus <josh@agliodbs.com> wrote:

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.

That is the use case, though. What I've encountered so far at 3
client
sites is tables which are largely append-only, with a few selects and
very few updates (< 2%) on recent data. In general, once data gets
flushed out of memory, it goes to disk and never gets recalled, and
certainly not written. Thinks are hunky-dory until we reach
max_freeze_age, at which point the server has to chew through hundreds
of gigabytes of old data just to freeze them, sometimes bringing the
application to a halt in the process.

The data doesn't get in there in " no time" if autovacuum was aware of
inserts too it would incrementally freeze the table as it grows.

It would still cause it to be read in again but not in a big chunck.

Couldn't pages that are totally filled by the same transaction, be
frozen on the initial write?

Jesper - given my limited knowledge about how it works.

#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jesper Krogh (#33)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Jesper Krogh <jesper@krogh.cc> wrote:

Couldn't pages that are totally filled by the same transaction, be

frozen on the initial write?

As far as I'm aware, that can only be done if:

(a) The tuples were written within the same transaction which
created or truncated the table.

*or*

(b) The writing transaction and all transactions concurrent to it
have completed by the time the page is about to be written.

-Kevin

#35Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#34)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Jesper Krogh <jesper@krogh.cc> wrote:

Couldn't pages that are totally filled by the same transaction, be

frozen on the initial write?

As far as I'm aware, that can only be done if:

(a)  The tuples were written within the same transaction which
created or truncated the table.

*or*

(b)  The writing transaction and all transactions concurrent to it
have completed by the time the page is about to be written.

Actually, I think this is true only in case (b). In case (a), you
mess up visibility with respect to other command-IDs within the
transaction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#36Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#35)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner

(a) The tuples were written within the same transaction which
created or truncated the table.

In case (a), you mess up visibility with respect to other
command-IDs within the transaction.

Surely that problem is surmountable?

-Kevin

#37Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#36)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner

(a)  The tuples were written within the same transaction which
created or truncated the table.

In case (a), you mess up visibility with respect to other
command-IDs within the transaction.

Surely that problem is surmountable?

I proposed an idea at PGCon, but I believe Tom and Heikki thought it
was far too grotty to consider.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#37)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Robert Haas <robertmhaas@gmail.com> wrote:

I proposed an idea at PGCon, but I believe Tom and Heikki thought
it was far too grotty to consider.

Well, as an alternative -- don't we have some information about the
relation pinned which could hold the xid of its creator? If the
tuple is frozen check to see if your transaction is the creator and
behave like you created the tuple (which, in fact, you did)?

-Kevin

#39Joe Conway
mail@joeconway.com
In reply to: Robert Haas (#35)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 05/27/2010 12:39 PM, Robert Haas wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Jesper Krogh <jesper@krogh.cc> wrote:

Couldn't pages that are totally filled by the same transaction, be

frozen on the initial write?

As far as I'm aware, that can only be done if:

(a) The tuples were written within the same transaction which
created or truncated the table.

*or*

(b) The writing transaction and all transactions concurrent to it
have completed by the time the page is about to be written.

Actually, I think this is true only in case (b). In case (a), you
mess up visibility with respect to other command-IDs within the
transaction.

(a) can work if it is all in one command, CREATE TABLE AS SELECT...

Additionally we were discussing COPY in the FROM clause, which means you
could CREATE TABLE AS SELECT ... FROM (COPY ...). That would allow bulk
loading with hint bits already set (and tuples frozen?).

Joe

#40Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Joe Conway (#39)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Joe Conway <mail@joeconway.com> wrote:

(a) can work if it is all in one command, CREATE TABLE AS
SELECT...

Additionally we were discussing COPY in the FROM clause, which
means you could CREATE TABLE AS SELECT ... FROM (COPY ...). That
would allow bulk loading with hint bits already set (and tuples
frozen?).

As long as it's hinted and frozen after a pg_dump -1 | psql I'll be
happy.

-Kevin

#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#37)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 27/05/10 22:56, Robert Haas wrote:

On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas<robertmhaas@gmail.com> wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner

(a) The tuples were written within the same transaction which
created or truncated the table.

In case (a), you mess up visibility with respect to other
command-IDs within the transaction.

Surely that problem is surmountable?

I proposed an idea at PGCon, but I believe Tom and Heikki thought it
was far too grotty to consider.

No, I think it's surmountable too. We discussed hacks to teach the MVCC
checks that all frozen tuples on a table that was created in the same
transaction (i.e. the same cases where we skip WAL logging) were
actually created by the running transaction, and check commandid
accordingly.

Or detect simple DML commands where we know that the command doesn't
read the table. COPY would usually fall into that category, though
non-immutable input functions make that a bit iffy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42Russell Smith
mr-russ@pws.com.au
In reply to: Josh Berkus (#29)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 28/05/10 04:00, Josh Berkus wrote:

Consider a table that is
regularly written but append-only. Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

If, however, we decree that you can't write a new tuple into a
PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
still have the small, incremental vacuums but those are pretty cheap,

That only works if those pages were going to be autovacuumed anyway. In
the case outlined above (which I've seen at 3 different production sites
this year), they wouldn't be; a table with less than 2% updates and
deletes does not get vacuumed until max_freeze_age for any reason. For
that matter, pages which are getting autovacuumed are not a problem,
period; they're being read and written and freezing them is not an issue.

I'm not seeing a way of fixing this common issue short of overhauling
CLOG, or of creating a freeze_map. Darn.

Don't you not get a positive enough effect by adjusting the table's
autovacuum_min_freeze_age and autovacuum_max_freeze_age. If you set
those numbers small, it appears to me that you would get very quickly to
a state where the vacuum would example only the most recent part of the
table rather than the whole thing. Does that give you enough of a win
that it stops the scanning and writing of the whole table which reduces
the performance problem being experienced. It's not a complete
solution, but does it go someway?

Regards

Russell

#43Alvaro Herrera
alvherre@commandprompt.com
In reply to: Russell Smith (#42)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Russell Smith's message of mié jun 02 06:38:35 -0400 2010:

Don't you not get a positive enough effect by adjusting the table's
autovacuum_min_freeze_age and autovacuum_max_freeze_age. If you set
those numbers small, it appears to me that you would get very quickly to
a state where the vacuum would example only the most recent part of the
table rather than the whole thing.

The problem is that vacuum doesn't know that a certain part of the table
is already frozen. It needs to scan it completely anyways. If we had a
"frozen" map, we could mark pages that are completely frozen and thus do
not need any vacuuming; but we don't (I don't recall the reasons for
this. Maybe it's just that no one has gotten around to it, or maybe
there's something else).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#43)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Alvaro Herrera <alvherre@commandprompt.com> writes:

The problem is that vacuum doesn't know that a certain part of the table
is already frozen. It needs to scan it completely anyways. If we had a
"frozen" map, we could mark pages that are completely frozen and thus do
not need any vacuuming; but we don't (I don't recall the reasons for
this. Maybe it's just that no one has gotten around to it, or maybe
there's something else).

Offhand I think the reason is that you'd have to trust the frozen bit
to be 100% correct (or at least never set to 1 in error). Currently,
both the FSM and visibility forks are just hints, and we won't suffer
data corruption if they're wrong; so we don't get too tense about WAL
logging or fsync'ing updates. I believe Heikki is looking into what
it'd take to make the visibility map 100% reliable, in connection with
the desire for index-only scans. If we get that and the overhead isn't
too terrible maybe we could build a frozen-status map the same way.

regards, tom lane

#45Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#44)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, Jun 2, 2010 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

The problem is that vacuum doesn't know that a certain part of the table
is already frozen.  It needs to scan it completely anyways.  If we had a
"frozen" map, we could mark pages that are completely frozen and thus do
not need any vacuuming; but we don't (I don't recall the reasons for
this.  Maybe it's just that no one has gotten around to it, or maybe
there's something else).

Offhand I think the reason is that you'd have to trust the frozen bit
to be 100% correct (or at least never set to 1 in error).  Currently,
both the FSM and visibility forks are just hints, and we won't suffer
data corruption if they're wrong; so we don't get too tense about WAL
logging or fsync'ing updates.  I believe Heikki is looking into what
it'd take to make the visibility map 100% reliable, in connection with
the desire for index-only scans.  If we get that and the overhead isn't
too terrible maybe we could build a frozen-status map the same way.

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#46Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#45)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
all the way forward to OldestXmin is a bit much, methinks.

Besides, there's another argument for not freezing tuples immediately:
they may be updated shortly thereafter, causing extra churn for no gain.

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

(BTW maybe instead of separate visibility and freeze maps we could have
two bits in the visibility map?)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#47Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#46)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Alvaro Herrera wrote:

Excerpts from Robert Haas's message of mi�� jun 02 14:16:33 -0400 2010:

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
all the way forward to OldestXmin is a bit much, methinks.

Besides, there's another argument for not freezing tuples immediately:
they may be updated shortly thereafter, causing extra churn for no gain.

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

(BTW maybe instead of separate visibility and freeze maps we could have
two bits in the visibility map?)

Yeah, the two-bits idea was suggested during the conversation core had
about the issue.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#48Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#46)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, Jun 2, 2010 at 3:10 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes.  Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
all the way forward to OldestXmin is a bit much, methinks.

Why? If it's just for forensics, those are some pretty expensive
forensics - it eventually costs you an additional complete rewrite of
every page.

Besides, there's another argument for not freezing tuples immediately:
they may be updated shortly thereafter, causing extra churn for no gain.

But if you were going to update PD_ALL_VISIBLE, then you were going to
write the page anyway. You might as well freeze everything at the
same time so you don't have to come back.

Alternatively, you could do what I suggested upthread and just believe
PD_ALL_VISIBLE over the individual tuple xmins. Then you don't have
to freeze the page until it's next written, but you still get to keep
your forensic info.

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this?  (was it Jeff Davis?)

Not me. I don't think that's going to help a whole lot, though. In
many of the painful scenarios, every tuple on the page will have the
same XID, and therefore they'll all be frozen at the same time anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#49Greg Stark
gsstark@mit.edu
In reply to: Robert Haas (#48)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

So I think the scheme in the original post of this thread is workable.
Not as described but could be made to work. In which case I think it's
preferable to a freeze map -- which I had previously assumed we would
need eventually.

The problem with the scheme originally described is that it assumed
you could have an cycle counter and then arrange that all the xids on
the page are within that cycle. That doesn't work because you could
easily have two live xids on the page that belong to two cycles -- one
FirstNormalTransactionId and one MaxTransactionId.

I think to make it work you need to store a whole 64-bit reference
transaction id consisting of both a cycle counter and a transaction
id. The invariant for the page is that every xid on the page can be
compared to that reference transaction id using normal transactionid
semantics. Actually I think the easiest way to do that is to set it to
the oldest xid on the page. The first thing to do before comparing any
transaction id on the page with a real transaction id would be to
figure out whether the reference xid is comparable to the live xid,
which if it's the oldest xid on the page implies they'll all be
comparable.

The way to maintain that invariant would be that any xid insertion on
the page must advance the reference xid if it's not comparable to the
newly inserted xid. It has to be advanced to the oldest xid that's
still comparable with the newly inserted xid. Any xids on the page
that are older than the new refernce xid have to be frozen or removed.
I'm not sure how to do that without keeping clog forever though.

#50Jan Wieck
JanWieck@Yahoo.com
In reply to: Robert Haas (#45)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 6/2/2010 2:16 PM, Robert Haas wrote:

On Wed, Jun 2, 2010 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

The problem is that vacuum doesn't know that a certain part of the table
is already frozen. It needs to scan it completely anyways. If we had a
"frozen" map, we could mark pages that are completely frozen and thus do
not need any vacuuming; but we don't (I don't recall the reasons for
this. Maybe it's just that no one has gotten around to it, or maybe
there's something else).

Offhand I think the reason is that you'd have to trust the frozen bit
to be 100% correct (or at least never set to 1 in error). Currently,
both the FSM and visibility forks are just hints, and we won't suffer
data corruption if they're wrong; so we don't get too tense about WAL
logging or fsync'ing updates. I believe Heikki is looking into what
it'd take to make the visibility map 100% reliable, in connection with
the desire for index-only scans. If we get that and the overhead isn't
too terrible maybe we could build a frozen-status map the same way.

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

Agreed.

The whole business of minimum freeze age always struck me as leaving
bread crumbs behind. Other than forensics, what is the actual value of
that overhead?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#51Jan Wieck
JanWieck@Yahoo.com
In reply to: Alvaro Herrera (#46)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
all the way forward to OldestXmin is a bit much, methinks.

Why?

Besides, there's another argument for not freezing tuples immediately:
they may be updated shortly thereafter, causing extra churn for no gain.

What extra churn does it create if the tuple can be frozen before the
bgwriter ever writes the page in the first place?

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#51)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Jan Wieck <JanWieck@Yahoo.com> writes:

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan, you don't know what you're talking about. I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug. I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

regards, tom lane

#53Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#52)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Fri, Jun 4, 2010 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this?  (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan, you don't know what you're talking about.  I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug.  I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

So, we're talking in circles here. I've already proposed a method
that would avoid the need to wipe out the xmins:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01485.php

And you said that if we were going to do that we might as well just
freeze sooner:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01548.php

If you don't want to freeze sooner, let's go back to the method
described in the first email.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#54Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#52)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I just see a lot of cost caused by this "safety range". I yet
have to see its real value, other than "feel good".

Jan, you don't know what you're talking about. I have repeatedly
had cases where being able to look at xmin was critical to
understanding a bug. I *will not* hold still for a solution that
effectively reduces min_freeze_age to zero.

In my experience with my own environment, I can honestly say that
it's clear that not freezing tuples quickly adds more cost than
running with cassert on. If we had to run in production with one or
the other, I would definitely choose cassert from a performance
perspective; which one would do more to find bugs? Why do we view
them so differently?

-Kevin

#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#54)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

In my experience with my own environment, I can honestly say that
it's clear that not freezing tuples quickly adds more cost than
running with cassert on. If we had to run in production with one or
the other, I would definitely choose cassert from a performance
perspective; which one would do more to find bugs? Why do we view
them so differently?

The reason for not recommending cassert in production builds is not
cost but stability. Per the fine manual:

Also, having the tests turned on won't necessarily enhance the
stability of your server! The assertion checks are not categorized
for severity, and so what might be a relatively harmless bug will
still lead to server restarts if it triggers an assertion
failure. This option is not recommended for production use, but
you should have it on for development work or when running a beta
version.

regards, tom lane

#56Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#55)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Fri, Jun 4, 2010 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

In my experience with my own environment, I can honestly say that
it's clear that not freezing tuples quickly adds more cost than
running with cassert on.  If we had to run in production with one or
the other, I would definitely choose cassert from a performance
perspective; which one would do more to find bugs?  Why do we view
them so differently?

The reason for not recommending cassert in production builds is not
cost but stability.  Per the fine manual:

        Also, having the tests turned on won't necessarily enhance the
        stability of your server!  The assertion checks are not categorized
        for severity, and so what might be a relatively harmless bug will
        still lead to server restarts if it triggers an assertion
        failure.  This option is not recommended for production use, but
        you should have it on for development work or when running a beta
        version.

We routinely castigate people for benchmarking done with cassert
turned on, and tell them their numbers are meaningless.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#56)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 4, 2010 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The reason for not recommending cassert in production builds is not
cost but stability.

We routinely castigate people for benchmarking done with cassert
turned on, and tell them their numbers are meaningless.

I didn't say it wasn't expensive ;-). But Kevin's question seemed to
be based on the assumption that runtime cost was the only negative.
It wouldn't be terribly hard to make a variant of cassert that skips
two or three of the most expensive things (particularly memory context
checking and CLOBBER_FREED_MEMORY), and from a cost perspective that
would be totally reasonable to run in production. We haven't done it
because of the stability issue.

regards, tom lane

#58Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#57)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane <tgl@sss.pgh.pa.us> wrote:

But Kevin's question seemed to be based on the assumption that
runtime cost was the only negative. It wouldn't be terribly hard
to make a variant of cassert that skips two or three of the most
expensive things (particularly memory context checking and
CLOBBER_FREED_MEMORY), and from a cost perspective that would be
totally reasonable to run in production. We haven't done it
because of the stability issue.

Fair enough. I was thinking of them both as debugging features,
which had various ideas roiling around in my head. Having run
hundreds of databases 24/7 for years without ever needing this
information, but paying the cost for it one way or another every
day, my perspective is that it would be A Good Thing if it could
just be turned on when needed. If you have recurring bug that can
be arranged, but in those cases you have other options; so I'm
assuming you want this kept because it is primarily of forensic
value after a non-repeatable bug has munged something?

Another thought bouncing around was that these breadcrumbs are
expensive; I was trying to think of some other way to capture the
information which would be cheaper, but I haven't thought of
anything, and I'm far from certain that cheaper breadcrumbs to
answer the need can be developed. The best thought I've had so far
is that if someone kept WAL files long enough the evidence might be
in there somewhere....

-Kevin

#59Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#58)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Kevin Grittner wrote:

Fair enough. I was thinking of them both as debugging features,
which had various ideas roiling around in my head. Having run
hundreds of databases 24/7 for years without ever needing this
information, but paying the cost for it one way or another every
day, my perspective is that it would be A Good Thing if it could
just be turned on when needed. If you have recurring bug that can
be arranged, but in those cases you have other options; so I'm
assuming you want this kept because it is primarily of forensic
value after a non-repeatable bug has munged something?

Another thought bouncing around was that these breadcrumbs are
expensive; I was trying to think of some other way to capture the
information which would be cheaper, but I haven't thought of
anything, and I'm far from certain that cheaper breadcrumbs to
answer the need can be developed. The best thought I've had so far
is that if someone kept WAL files long enough the evidence might be
in there somewhere....

The idea that thousands of Postgres installations are slower just so we
can occasionally debug xmin/xmax issues seems way off balance to me.
If people want debugging, let them modify the freeze age settings; the
defaults should not favor debugging when there is a measurable cost
involved. How many times in the past five years have we even needed
such debugging information, and also are cases where we could not have
told the user to change freeze settings to get us that info?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#60Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#58)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

... my perspective is that it would be A Good Thing if it could
just be turned on when needed. If you have recurring bug that can
be arranged, but in those cases you have other options; so I'm
assuming you want this kept because it is primarily of forensic
value after a non-repeatable bug has munged something?

Yeah, that's exactly the problem. When you realize you need it,
it's too late.

The best thought I've had so far
is that if someone kept WAL files long enough the evidence might be
in there somewhere....

Hm, that is an excellent point. The WAL trace would actually be a lot
superior in terms of being able to figure out what went wrong. But
I don't quite see how we tell people "either keep xmin or keep your
old WAL". Also, for production sites the amount of WAL you'd have to
hang onto seems a bit daunting. Other problems are the cost of shipping
it to a developer, and the impracticality of sanitizing private data in
it before you show it to somebody.

regards, tom lane

#61Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#60)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

The best thought I've had so far is that if someone kept WAL
files long enough the evidence might be in there somewhere....

Hm, that is an excellent point. The WAL trace would actually be a
lot superior in terms of being able to figure out what went wrong.
But I don't quite see how we tell people "either keep xmin or keep
your old WAL". Also, for production sites the amount of WAL you'd
have to hang onto seems a bit daunting.

Any thoughts on how far back the WAL would need to go to deal with
the issues where such information has been useful? (For example, we
always have at least two weeks worth, but I don't know if that's a
useful range or not.)

Other problems are the cost of shipping it to a developer, and the
impracticality of sanitizing private data in it before you show it
to somebody.

Yeah, this wouldn't be a practical answer to the need unless
PostgreSQL shipped with a tool which could scan WAL and extract the
relevant information (probably under direction of someone from the
list or a private support organization). Is the required
information predictable enough to make developing such a tool a
tractable problem?

-Kevin

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#61)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm, that is an excellent point. The WAL trace would actually be a
lot superior in terms of being able to figure out what went wrong.
But I don't quite see how we tell people "either keep xmin or keep
your old WAL". Also, for production sites the amount of WAL you'd
have to hang onto seems a bit daunting.

Any thoughts on how far back the WAL would need to go to deal with
the issues where such information has been useful? (For example, we
always have at least two weeks worth, but I don't know if that's a
useful range or not.)

Well, it's a "how long does it take you to notice data corruption"
kind of issue. The most recent case I can think of where xmin was
helpful was in trying to sort out a problem with an index being
inconsistent with the heap, which manifested as wrong query answers
for the user. I don't know how long it took him to recognize and
report the problem. (We never did locate the bug-if-any, IIRC...
it would have been much more helpful to have the WAL trace. xmin
did let me rule out some theories, though.)

Other problems are the cost of shipping it to a developer, and the
impracticality of sanitizing private data in it before you show it
to somebody.

Yeah, this wouldn't be a practical answer to the need unless
PostgreSQL shipped with a tool which could scan WAL and extract the
relevant information (probably under direction of someone from the
list or a private support organization). Is the required
information predictable enough to make developing such a tool a
tractable problem?

Hard to tell. If we were actually going in this direction we'd
want to write a much better WAL-text-dump tool than we have, and then
in principle somebody could sanitize the text output before shipping
it off. But going through a large volume of data that way could be
pretty impractical. Also, we (or at least I) have nearly zip experience
with trying to debug problems by examining WAL, so it's not real clear
to me which details might be important.

regards, tom lane

#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#59)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Bruce Momjian <bruce@momjian.us> writes:

The idea that thousands of Postgres installations are slower just so we
can occasionally debug xmin/xmax issues seems way off balance to me.

There's no evidence whatsoever that the scope of the problem is that large.

If people want debugging, let them modify the freeze age settings; the
defaults should not favor debugging when there is a measurable cost
involved. How many times in the past five years have we even needed
such debugging information, and also are cases where we could not have
told the user to change freeze settings to get us that info?

You're missing the point here: this is something we need when trying
to make sense of cases that are hard or impossible to reproduce.
Retroactively changing the freeze policy isn't possible.

regards, tom lane

#64Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#63)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

The idea that thousands of Postgres installations are slower just
so we can occasionally debug xmin/xmax issues seems way off
balance to me.

There's no evidence whatsoever that the scope of the problem is
that large.

Well, are we agreed that the current approach means that insertion
of a heap tuple normally requires it to be written to disk three
times, with two of those WAL-logged? And that deletion of a tuple
generally requires the same? I'd say that constitutes prima facie
evidence that any PostgreSQL installation doing any significant
number of writes is slower because of this. Are you suggesting
there aren't thousands of such installations, or that the repeated
disk writes are generally free?

-Kevin

#65Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#62)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we were actually going in this direction we'd want to write a
much better WAL-text-dump tool than we have, and then in principle
somebody could sanitize the text output before shipping it off.

I wouldn't think this would be practical unless there was a way to
scan the WAL files and dump only the bits related to the affected
pages, and probably with at least an option (possibly default) to
just mention the data type and length, rather than showing the
actual values -- how often are the values relevant, anyway? (Not
rhetorical; I really don't know.)

-Kevin

#66Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#63)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

The idea that thousands of Postgres installations are slower just so we
can occasionally debug xmin/xmax issues seems way off balance to me.

There's no evidence whatsoever that the scope of the problem is that large.

If people want debugging, let them modify the freeze age settings; the
defaults should not favor debugging when there is a measurable cost
involved. How many times in the past five years have we even needed
such debugging information, and also are cases where we could not have
told the user to change freeze settings to get us that info?

You're missing the point here: this is something we need when trying
to make sense of cases that are hard or impossible to reproduce.
Retroactively changing the freeze policy isn't possible.

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#66)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Bruce Momjian <bruce@momjian.us> writes:

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

regards, tom lane

#68Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#67)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

Well, guess then. In the past, how many forensic cases were needed for
in-place VACUUM FULL bugs, vs. other cases?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#68)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

Well, guess then.

I already told you my opinion on this matter. Since you're prepared
to discount that, I don't see why you'd put any credence in my
evidence-free guesses.

regards, tom lane

#70Alvaro Herrera
alvherre@commandprompt.com
In reply to: Bruce Momjian (#68)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Bruce Momjian's message of vie jun 04 15:39:07 -0400 2010:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

Well, guess then. In the past, how many forensic cases were needed for
in-place VACUUM FULL bugs, vs. other cases?

I don't understand the question. I know I have debugged a bunch of
cases of data corruption, and having xmin/xmax around has been truly
useful. VACUUM FULL has never been involved (that I know of -- most of
our customers tend not to run it AFAIK), so why would I care about
whether it's gone in 9.0? Note that it's not always about PG bugs; but
in the cases where xmin=FrozenXid for all/most involved tuples, the
problems are more difficult to track down.

Yes, VACUUM FULL had bugs too -- I, for one, welcome our new
not-in-place VACUUM FULL overlord.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#71Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#70)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Alvaro Herrera wrote:

Excerpts from Bruce Momjian's message of vie jun 04 15:39:07 -0400 2010:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

Well, guess then. In the past, how many forensic cases were needed for
in-place VACUUM FULL bugs, vs. other cases?

I don't understand the question. I know I have debugged a bunch of
cases of data corruption, and having xmin/xmax around has been truly
useful. VACUUM FULL has never been involved (that I know of -- most of
our customers tend not to run it AFAIK), so why would I care about
whether it's gone in 9.0? Note that it's not always about PG bugs; but
in the cases where xmin=FrozenXid for all/most involved tuples, the
problems are more difficult to track down.

Yes, VACUUM FULL had bugs too -- I, for one, welcome our new
not-in-place VACUUM FULL overlord.

OK, so we had lots of forensics the didn't involve VACUUM FULL. That's
what I wanted to know.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#72Greg Smith
greg@2ndquadrant.com
In reply to: Tom Lane (#62)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Tom Lane wrote:

If we were actually going in this direction we'd
want to write a much better WAL-text-dump tool than we have, and then
in principle somebody could sanitize the text output before shipping
it off. But going through a large volume of data that way could be
pretty impractical. Also, we (or at least I) have nearly zip experience
with trying to debug problems by examining WAL, so it's not real clear
to me which details might be important.

There's another interesting thing about moving in this direction too.
Systems that have WAL archiving setup that run into a problem could end
up being a much richer source for historical analysis of how the system
got into the bad state than is available right now. Typically those can
have longer histories available than you'll find on a primary that's
recycling segments all the time.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#73Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#62)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

If we were actually going in this direction we'd
want to write a much better WAL-text-dump tool than we have, and then
in principle somebody could sanitize the text output before shipping
it off. But going through a large volume of data that way could be
pretty impractical. Also, we (or at least I) have nearly zip experience
with trying to debug problems by examining WAL, so it's not real clear
to me which details might be important.

Just an off-the-wall thought, but, would it be possible to have a tool
which read WAL backwards and compared entries in the WAL against entries
on disk? I realize that you'd only see one version of a particular
block and then have to skip any updates which are earlier than it, but
it seems like you could cover a pretty large chunk of the recent changes
to the database using this approach..

Thanks,

Stephen

#74Greg Stark
gsstark@mit.edu
In reply to: Stephen Frost (#73)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, Jun 8, 2010 at 2:26 AM, Stephen Frost <sfrost@snowman.net> wrote:

Just an off-the-wall thought, but, would it be possible to have a tool
which read WAL backwards and compared entries in the WAL against entries
on disk?  I realize that you'd only see one version of a particular
block and then have to skip any updates which are earlier than it, but
it seems like you could cover a pretty large chunk of the recent changes
to the database using this approach..

I assume you mean back out the changes incrementally until you find a
full_page_write and see if it matches? And continue comparing with
full_page_writes once per checkpoint? I don't think the WAL has enough
information to replay backwards though. For example vacuum cleanup
records just list the tids to remove. They don't have the contents to
replace there.

--
greg

#75Stephen Frost
sfrost@snowman.net
In reply to: Greg Stark (#74)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

* Greg Stark (gsstark@mit.edu) wrote:

I assume you mean back out the changes incrementally until you find a
full_page_write and see if it matches?

To be honest, you're already assuming I know more about how this all
works than I do. :) The gist of my thought was simply- we write out
block changes to the WAL, including data in many cases. If we were to
look at the very end of the WAL, at the last piece of data written
there, and the data files have supposedly been flushed, then what's in
the WAL at that point should match what's in the data files, right? If
it doesn't, that'd be bad.

And continue comparing with
full_page_writes once per checkpoint?

If we could only do it when there's a full page write, then perhaps that
would work as well, but I thought we tracked them at a lower level. In
any case, the idea is the same- compare what's in WAL to what's supposed
to be on disk, and alarm whenever there's a clear error.

I don't think the WAL has enough
information to replay backwards though. For example vacuum cleanup
records just list the tids to remove. They don't have the contents to
replace there.

Right, you couldn't actually move the database backwards in time using
this tool (because we only write out new data, we don't write out what
was in that block/page before the write)- that isn't the idea or intent.
It would just be a tool that someone could run against a database where
they've detected corruption (or, I dunno, more frequently, to perhaps
catch corruption faster?), to see if the problem is a PG bug or a
hardware/kernel/etc issue. In fact, if you could somehow do this
against a backup that's been taken using pg_start_backup/pg_stop_backup,
that would be pretty awesome.

I know that if such a tool existed, I'd be happy to run it as part of my
regular backup routines- I *always* have all the WALs from my last
backup to my next backup (and typically farther back than that, eg: if I
run full backups weekly, I'll have 4 full backups + all 4 weeks of WALs,
to be able to replay back to any point in the month..).

The big question that I have is- would this actually be productive?
Would it actually be able to catch hardware corruption or help at all
with PG bugs? Those are the things I'm not really sure about.

Thanks,

Stephen

#76Jan Wieck
JanWieck@Yahoo.com
In reply to: Greg Stark (#74)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On 6/8/2010 8:27 AM, Greg Stark wrote:

On Tue, Jun 8, 2010 at 2:26 AM, Stephen Frost <sfrost@snowman.net> wrote:

Just an off-the-wall thought, but, would it be possible to have a tool
which read WAL backwards and compared entries in the WAL against entries
on disk? I realize that you'd only see one version of a particular
block and then have to skip any updates which are earlier than it, but
it seems like you could cover a pretty large chunk of the recent changes
to the database using this approach..

I assume you mean back out the changes incrementally until you find a
full_page_write and see if it matches? And continue comparing with
full_page_writes once per checkpoint? I don't think the WAL has enough
information to replay backwards though. For example vacuum cleanup
records just list the tids to remove. They don't have the contents to
replace there.

You can't back out changes. WAL does not contain before images.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#77Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#52)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan, you don't know what you're talking about. I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug. I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

Recent history shows Tom's view to be the most useful one: its useful to
keep seeing the xmin. The last time we altered the way we set hint bits
we caused multiple data loss bugs doing it. We will need to debug things
and the WAL is always long gone (great idea though).

Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
keep the xmin but also can see it is frozen?

We already WAL-log certain flag settings, so why not this one also?

--
Simon Riggs www.2ndQuadrant.com

#78Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#77)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, Jun 8, 2010 at 4:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this?  (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan, you don't know what you're talking about.  I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug.  I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

Recent history shows Tom's view to be the most useful one: its useful to
keep seeing the xmin. The last time we altered the way we set hint bits
we caused multiple data loss bugs doing it. We will need to debug things
and the WAL is always long gone (great idea though).

Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
keep the xmin but also can see it is frozen?

We could do that, but I think the point of this exercise is to reduce
I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
not clear how such a flag would help with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#79Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#78)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, 2010-06-08 at 16:58 -0400, Robert Haas wrote:

On Tue, Jun 8, 2010 at 4:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan, you don't know what you're talking about. I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug. I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

Recent history shows Tom's view to be the most useful one: its useful to
keep seeing the xmin. The last time we altered the way we set hint bits
we caused multiple data loss bugs doing it. We will need to debug things
and the WAL is always long gone (great idea though).

Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
keep the xmin but also can see it is frozen?

We could do that, but I think the point of this exercise is to reduce
I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
not clear how such a flag would help with that.

Hmmm: You suggested a variant of this idea, so whatever reasoning was
behind your suggestion would be shared here, surely?

Tom has been saying we cannot freeze early because we need to keep
xmins. I agree with that. This suggestion shows it is possible to freeze
a tuple AND keep its xmin. So that removes the argument that we should
freeze more aggressively (whenever we write the block) and can thus
reduce longer term I/O costs.

--
Simon Riggs www.2ndQuadrant.com

#80Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#79)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, Jun 8, 2010 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
keep the xmin but also can see it is frozen?

We could do that, but I think the point of this exercise is to reduce
I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
not clear how such a flag would help with that.

Hmmm: You suggested a variant of this idea, so whatever reasoning was
behind your suggestion would be shared here, surely?

Tom has been saying we cannot freeze early because we need to keep
xmins. I agree with that. This suggestion shows it is possible to freeze
a tuple AND keep its xmin. So that removes the argument that we should
freeze more aggressively (whenever we write the block) and can thus
reduce longer term I/O costs.

OK, yes, I see what you're getting at now. There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
not sure which way is better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#81Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#80)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:

OK, yes, I see what you're getting at now. There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

--
Simon Riggs www.2ndQuadrant.com

#82Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#81)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Simon Riggs <simon@2ndQuadrant.com> writes:

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:

OK, yes, I see what you're getting at now. There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). Setting a bit is no cheaper from that standpoint than changing
the xmin field.

regards, tom lane

#83Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#82)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Excerpts from Tom Lane's message of mar jun 08 18:35:00 -0400 2010:

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). Setting a bit is no cheaper from that standpoint than changing
the xmin field.

... unless the bit is outside the page itself -- so we get back to the
idea of a freeze map.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#84Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#82)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, Jun 8, 2010 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:

OK, yes, I see what you're getting at now.  There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here.  I am
not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit).  Setting a bit is no cheaper from that standpoint than changing
the xmin field.

Except for insert-only tables, I don't believe this is true. If you
freeze all tuples by the time the pages are marked all-visible,
perhaps via the xmin-preserving mechanism Simon suggested, then you
can use the visibility map to skip anti-wraparound vacuum as well as
regular vacuum. That sounds to me like it's accomplishing something.
Is it a complete solution? No. Is it better than what we have now?
Yes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#85Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#82)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Tue, 2010-06-08 at 18:35 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:

OK, yes, I see what you're getting at now. There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). Setting a bit is no cheaper from that standpoint than changing
the xmin field.

No, it doesn't of itself, but if you raise a complaint then we must
first address the complaint as a sub-topic before we continue the main
discussion around $TOPIC. My proposal removes the barrier that early
freezing would overwrite xmin values, so early freezing need not have
any negative effects.

The general idea is to hide the "third write" (freezing) on a tuple by
making it happen at the same time as another tuple's "second
write" (hint bit setting).

I'm happy to let that continue by the OPs.

--
Simon Riggs www.2ndQuadrant.com

#86marcin mank
marcin.mank@gmail.com
In reply to: Tom Lane (#82)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, Jun 9, 2010 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:

OK, yes, I see what you're getting at now.  There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here.  I am
not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit).  Setting a bit is no cheaper from that standpoint than changing
the xmin field.

Could a tuple wih the bit set be considered frozen already? Would we
actually ever need to rewrite the xmin, even for anti-wraparound
reasons?

Greetings
Marcin

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#84)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 8, 2010 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). �Setting a bit is no cheaper from that standpoint than changing
the xmin field.

Except for insert-only tables, I don't believe this is true.

But insert-only tables are exactly the case that Josh complained about
to start with.

If you
freeze all tuples by the time the pages are marked all-visible,
perhaps via the xmin-preserving mechanism Simon suggested, then you
can use the visibility map to skip anti-wraparound vacuum as well as
regular vacuum. That sounds to me like it's accomplishing something.
Is it a complete solution? No. Is it better than what we have now?
Yes.

I do like the idea of using a status bit rather than FrozenXid to mark a
frozen tuple, because that eliminates the conflict between wanting to
freeze aggressively for performance reasons and wanting to preserve Xids
for forensic reasons. But it doesn't seem to do much for Josh's
original problem.

regards, tom lane

#88Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#87)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Wed, Jun 9, 2010 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you
freeze all tuples by the time the pages are marked all-visible,
perhaps via the xmin-preserving mechanism Simon suggested, then you
can use the visibility map to skip anti-wraparound vacuum as well as
regular vacuum.  That sounds to me like it's accomplishing something.
Is it a complete solution? No.  Is it better than what we have now?
Yes.

I do like the idea of using a status bit rather than FrozenXid to mark a
frozen tuple, because that eliminates the conflict between wanting to
freeze aggressively for performance reasons and wanting to preserve Xids
for forensic reasons.  But it doesn't seem to do much for Josh's
original problem.

OK, I see. So maybe we add a Todo to implement that, and then keep
thinking about how to fix Josh's problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: marcin mank (#86)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

marcin mank <marcin.mank@gmail.com> writes:

Could a tuple wih the bit set be considered frozen already? Would we
actually ever need to rewrite the xmin, even for anti-wraparound
reasons?

That's exactly what Simon is suggesting: if we had a tuple status flag
with the semantics of "this xmin is known visible to all current and
future transactions", we could consider setting that bit to be the moral
equivalent of freezing the tuple. The tuple visibility tests would
never actually consult clog for such an xmin and thus we'd never have to
replace it with FrozenXid.

But this doesn't in itself save us any work: we'd still need to treat
setting that bit as a WAL-logged operation, and we'd still need to have
VACUUM track the oldest not-thus-hinted xmins. What it does do is
eliminate the conflict between wanting to freeze tuples aggressively for
various performance reasons and wanting to preserve original xmin values
for forensic reasons.

I wonder how this might play into Heikki's ideas about making the
visibility map trustworthy. If we WAL-logged the operation of "set all
the per-tuple VISIBLE-TO-ALL bits on this page, as well as the page's
bit in the visibility map", then that end of things would be
trustworthy. And all the operations that have to unset the map bit
are already WAL-logged.

regards, tom lane

#90Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#87)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Seems I underestimated the importance of forensic breadcrumbs.

On 6/9/2010 12:09 PM, Tom Lane wrote:

I do like the idea of using a status bit rather than FrozenXid to mark a
frozen tuple, because that eliminates the conflict between wanting to
freeze aggressively for performance reasons and wanting to preserve Xids
for forensic reasons. But it doesn't seem to do much for Josh's
original problem.

Depends. Specifically on transaction profiles and how long the blocks
linger around before being written. If you can set the all visible bit
by the time, the page is written the first time, what bit including the
is-frozen one cannot be set at that time too?

Maybe some analysis on the typical behavior of such system is in order.
Especially the case Josh was mentioning seems to be a typical single
insert logging style application, with little else going on on that
particular database. I can't reveal specifics about that particular
case, but think of something like taking frequent sensor readings, that
need to be kept for years for forensics in case there is a product
recall some day.

And even if some cases still required another page write because those
frozen bits cannot be set on first write, this seems to be a win-win. We
would get rid of the FrozenXid completely and shift to a bit, so we can
effectively have a min_ freeze_age of zero while keeping the xid's forever.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

#91Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#90)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Jan Wieck <JanWieck@Yahoo.com> writes:

Depends. Specifically on transaction profiles and how long the blocks
linger around before being written. If you can set the all visible bit
by the time, the page is written the first time, what bit including the
is-frozen one cannot be set at that time too?

All-visible and is-frozen would be the same bit ...

And even if some cases still required another page write because those
frozen bits cannot be set on first write, this seems to be a win-win. We
would get rid of the FrozenXid completely and shift to a bit, so we can
effectively have a min_ freeze_age of zero while keeping the xid's forever.

Right. I don't see any downside, other than eating another status bit
per tuple, which we can afford.

regards, tom lane

#92Greg Stark
gsstark@mit.edu
In reply to: Greg Stark (#49)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark <gsstark@mit.edu> wrote:

I think to make it work you need to store a whole 64-bit reference
transaction id consisting of both a cycle counter and a transaction
id. The invariant for the page is that every xid on the page can be
compared to that reference transaction id using normal transactionid
semantics. Actually I think the easiest way to do that is to set it to
the oldest xid on the page. The first thing to do before comparing any
transaction id on the page with a real transaction id would be to
figure out whether the reference xid is comparable to the live xid,
which if it's the oldest xid on the page implies they'll all be
comparable.

The way to maintain that invariant would be that any xid insertion on
the page must advance the reference xid if it's not comparable to the
newly inserted xid. It has to be advanced to the oldest xid that's
still comparable with the newly inserted xid. Any xids on the page
that are older than the new refernce xid have to be frozen or removed.
I'm not sure how to do that without keeping clog forever though.

So the more I think about this the more I think it's unavoidable that
we would need to retain clog forever.

I think the goal here is to be able to load data into the database and
then never write the data ever again. Even if you visit the page years
later after the transaction ids have wrapped around several times. In
that case there's no avoiding that you'll need to know whether that
transaction committed or aborted.

Now we could make a bet that most transactions commit and therefore we
could keep a list of aborted transactions only which we might be able
to keep "forever" in very little space if very few transactions abort.
Presumably we would only use this form once the transaction was about
to be truncated out of clog. I'm not too happy with the assumption
that there aren't many aborts though. Someone could come along with a
use case where they have lots of aborts and run into strange
limitations and performance characteristics.

Alternatively we could do something like keeping a list of tables
touched by any transaction. Then vacuum could look for any
non-committed transactions old enough to be in danger of aging out of
clog and ensure those tables are frozen. But any tables which have
never been touched by any such old transaction could be left alone.
when we read in the page we'll be able to recognize the old
transactions as committed if they're beyond the end of the clog
horizon.

I don't really like that idea either because it leaves performance
really quite unpredictable. I could have a large table that goes
unvacuumed for a long time -- then when I come along with some tiny
query where I hit C-c and cause an abort I suddenly set a trap which
causes a huge vacuum freeze to fire off.

--
greg

#93Bruce Momjian
bruce@momjian.us
In reply to: Greg Stark (#92)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Greg Stark wrote:

On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark <gsstark@mit.edu> wrote:

I think to make it work you need to store a whole 64-bit reference
transaction id consisting of both a cycle counter and a transaction
id. The invariant for the page is that every xid on the page can be
compared to that reference transaction id using normal transactionid
semantics. Actually I think the easiest way to do that is to set it to
the oldest xid on the page. The first thing to do before comparing any
transaction id on the page with a real transaction id would be to
figure out whether the reference xid is comparable to the live xid,
which if it's the oldest xid on the page implies they'll all be
comparable.

The way to maintain that invariant would be that any xid insertion on
the page must advance the reference xid if it's not comparable to the
newly inserted xid. It has to be advanced to the oldest xid that's
still comparable with the newly inserted xid. Any xids on the page
that are older than the new refernce xid have to be frozen or removed.
I'm not sure how to do that without keeping clog forever though.

So the more I think about this the more I think it's unavoidable that
we would need to retain clog forever.

I think the goal here is to be able to load data into the database and
then never write the data ever again. Even if you visit the page years
later after the transaction ids have wrapped around several times. In
that case there's no avoiding that you'll need to know whether that
transaction committed or aborted.

I think we might need two bits, one commited and all visible, and
another aborted and all vislble.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#93)
Re: Idea for getting rid of VACUUM FREEZE on cold pages

Bruce Momjian <bruce@momjian.us> writes:

I think we might need two bits, one commited and all visible, and
another aborted and all vislble.

Huh? The latter means "vacuumable".

regards, tom lane