Per-table freeze limit proposal

Started by Alvaro Herreraover 20 years ago14 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Hackers,

As you've probably heard too many times already, I'm thinking in
improving vacuum, so we can keep track of the freeze Xid on a table
level, rather than database level. Hopefully this will eliminate the
need for database-wide vacuums.

In fact this seems pretty easy to do. Add a field to pg_class, tell
VACUUM to update it using the determined freezeLimit, and that's it.
(Note that if we ever implement partial vacuum, it won't be able to
update the freeze point. But that was true before anyway.)

We also need to teach autovacuum to update pg_database.datfreezexid,
using the minimum from pg_class. (I don't think it's a good idea to
seqscan pg_class to find out the minimum on each VACUUM call.) So, an
autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then
get the minimum freezexid from pg_class to update pg_database. This
way, GetNewTransactionId can continue checking pg_database.datfreezexid
as the hard limit for issuing warnings for Xid wraparound.

Does anyone see a need for anything other than the autovacuum process to
be updating pg_database.datfreezexid? Of course, if autovacuum is not
in use, things would continue as now, that is, manual database-wide
VACUUM calls updating pg_database.datfreezexid. But note that you can
mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM
calls as needed (from cron or whatever), and use autovacuum to set
pg_database.datfreezexid -- so autovacuum would in fact do nothing
except set the freeze limit.

The problem is, this seems so awfully simple that I fear I am missing
something ... Otherwise, does this sound like a plan?

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
The easiest way to resolve [trivial code guidelines disputes] is to fire
one or both of the people involved. (Damian Conway)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: Per-table freeze limit proposal

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

In fact this seems pretty easy to do. Add a field to pg_class, tell
VACUUM to update it using the determined freezeLimit, and that's it.

I think that it'd be worth fixing things so that the recorded value
is not the freeze cutoff value (as now), but the actual lowest
not-frozen XID present anywhere in the table. The present code does not
do that because it's painful to track across multiple tables, but on a
per-table basis it seems easy. In particular this rule allows you to
set a sane value for the pg_class field when the table is created (ie,
current transaction's XMIN, rather than a billion less).

(Note that if we ever implement partial vacuum, it won't be able to
update the freeze point. But that was true before anyway.)

Sure.

We also need to teach autovacuum to update pg_database.datfreezexid,
using the minimum from pg_class.

No, no, no. autovacuum is not a required part of the system and it's
not going to become so any time soon. Updating the pg_database entry
will have to be the responsibility of VACUUM itself. It's not that
terrible: you don't have to scan pg_class unless you see that the
pg_class.relfreezexid value you are replacing is equal to
pg_database.datfreezexid, and with the exact computation suggested
above, that won't be a common occurrence.

regards, tom lane

#3Jim Buttafuoco
jim@contactbda.com
In reply to: Alvaro Herrera (#1)
Re: Per-table freeze limit proposal

while you are at it, can you put in some audit timestamps as to when the vacuum occurred (full vs not full).

---------- Original Message -----------
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
To: Hackers <pgsql-hackers@postgresql.org>
Sent: Wed, 14 Sep 2005 22:14:23 -0400
Subject: [HACKERS] Per-table freeze limit proposal

Hackers,

As you've probably heard too many times already, I'm thinking in
improving vacuum, so we can keep track of the freeze Xid on a table
level, rather than database level. Hopefully this will eliminate the
need for database-wide vacuums.

In fact this seems pretty easy to do. Add a field to pg_class, tell
VACUUM to update it using the determined freezeLimit, and that's it.
(Note that if we ever implement partial vacuum, it won't be able to
update the freeze point. But that was true before anyway.)

We also need to teach autovacuum to update pg_database.datfreezexid,
using the minimum from pg_class. (I don't think it's a good idea to
seqscan pg_class to find out the minimum on each VACUUM call.) So, an
autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then
get the minimum freezexid from pg_class to update pg_database. This
way, GetNewTransactionId can continue checking pg_database.datfreezexid
as the hard limit for issuing warnings for Xid wraparound.

Does anyone see a need for anything other than the autovacuum process to
be updating pg_database.datfreezexid? Of course, if autovacuum is not
in use, things would continue as now, that is, manual database-wide
VACUUM calls updating pg_database.datfreezexid. But note that you can
mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM
calls as needed (from cron or whatever), and use autovacuum to set
pg_database.datfreezexid -- so autovacuum would in fact do nothing
except set the freeze limit.

The problem is, this seems so awfully simple that I fear I am missing
something ... Otherwise, does this sound like a plan?

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
The easiest way to resolve [trivial code guidelines disputes] is to fire
one or both of the people involved. (Damian Conway)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

------- End of Original Message -------

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Per-table freeze limit proposal

On Wed, Sep 14, 2005 at 11:30:52PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

In fact this seems pretty easy to do. Add a field to pg_class, tell
VACUUM to update it using the determined freezeLimit, and that's it.

I think that it'd be worth fixing things so that the recorded value
is not the freeze cutoff value (as now), but the actual lowest
not-frozen XID present anywhere in the table.

Cool. I wonder if the exact figure should be

min(lowest non-frozen Xid in table, GetOldestXmin(false))

just in case a long-running transaction inserts a new tuple after the
vacuum is done. Also GetOldestXmin should be the value used for empty
tables. For shared relations, we'd use GetOldestXmin(true).

Also, in light of this, it seems a bad idea to use the name "freezexid"
for the pg_class column; I would name it relminxid or something like
that (suggestions welcome). Not sure about renaming the pg_database
column -- I don't see why not.

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"La Primavera ha venido. Nadie sabe como ha sido" (A. Machado)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: Per-table freeze limit proposal

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Cool. I wonder if the exact figure should be
min(lowest non-frozen Xid in table, GetOldestXmin(false))

Actually just min(lowest Xid in table, RecentXmin). You only need to be
sure there are no running transactions older than what you put into the
field; their xmins are not at issue.

Also, in light of this, it seems a bad idea to use the name "freezexid"
for the pg_class column; I would name it relminxid or something like
that (suggestions welcome).

Works for me.

regards, tom lane

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Buttafuoco (#3)
Re: Per-table freeze limit proposal

It would also be very handy to be able to tell how many transactions (or
inserts/updates/deletes) have occured since the last vacuum. Presumably
autovacuum needs to know this already, but is it exposed?

On Thu, Sep 15, 2005 at 07:46:26AM -0400, Jim Buttafuoco wrote:

while you are at it, can you put in some audit timestamps as to when the vacuum occurred (full vs not full).

---------- Original Message -----------
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
To: Hackers <pgsql-hackers@postgresql.org>
Sent: Wed, 14 Sep 2005 22:14:23 -0400
Subject: [HACKERS] Per-table freeze limit proposal

Hackers,

As you've probably heard too many times already, I'm thinking in
improving vacuum, so we can keep track of the freeze Xid on a table
level, rather than database level. Hopefully this will eliminate the
need for database-wide vacuums.

--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: [HACKERS] Per-table freeze limit proposal

On Wed, Sep 14, 2005 at 11:30:52PM -0400, Tom Lane wrote:

Updating the pg_database entry
will have to be the responsibility of VACUUM itself. It's not that
terrible: you don't have to scan pg_class unless you see that the
pg_class.relfreezexid value you are replacing is equal to
pg_database.datfreezexid, and with the exact computation suggested
above, that won't be a common occurrence.

Ok, this patch is a first cut at that. I made it slightly smarter,
because we want to do it only once if we issue a database-wide vacuum,
instead of once per table. However, I also had to cope with the
possibility that the table with the minimum Xid value is dropped, so I
made that set the datminxid to InvalidXid. If after a VACUUM the
datminxid is found to be Invalid, pg_class is scanned inconditionally.

There's the usual gotcha with shared relations: vacuuming them on one
database is not going to update their pg_class entries in other
databases, so they will be vacuumed more frequently than really
necessary. I don't see how to fix that -- one way would be storing
their stats in one database only, but it seems dangerous.

Note that I use LockSharedObject() to lock the database while we are
updating the pg_database row. This means that more than one database
can be updated concurrently (this is important because we have to keep
the lock while we seqscan pg_class). This may be a bad idea from the
point of view of the buffer manager; maybe we need an additional
LockBuffer() just before we are going to modify the tuple.

With this in place, it's no longer necessary to issue database-wide
vacuums anymore. Note that I haven't tested the part where the clog
grows enough to be truncated, nor really anything more complicated than
a single backend doing database-wide or single-table vacuums (with an
optional parallel backend with an open transaction). Still, it shows
what the basics of the patch are.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Lo esencial es invisible para los ojos" (A. de Saint Ex�pery)

Attachments:

vacuum-min-xid.patchtext/plain; charset=us-asciiDownload+627-513
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: [HACKERS] Per-table freeze limit proposal

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Cool. I wonder if the exact figure should be
min(lowest non-frozen Xid in table, GetOldestXmin(false))

Actually just min(lowest Xid in table, RecentXmin). You only need to be
sure there are no running transactions older than what you put into the
field; their xmins are not at issue.

Ok, patch attached. Two new columns in pg_class store two Xids: the
"relminxid" is the one in the equation above. The relvacuumxid is the
OldestXmin. Two columns in pg_database replace the previous two,
datminxid is the minimum of all relminxid in the database, and
datvacuumxid is the minimum of all relvacuumxid. (datfreezexid is no
more.)

So we can check the Xid wrap horizon using datminxid, and truncate clog
using datvacuumxid. (Actually I was going to post the patch yesterday
without the datvacuumxid/relvacuumxid part, when I noticed that I was
truncating clog on datminxid which seemed a bad idea.)

Additionally I made DROP TABLE invalidate datminxid and datvacuumxid
when the table with the minimum is dropped. New values for the
pg_database columns are calculated by scanning pg_class at the next
vacuum when any of them is invalid, or when the table which was holding
the minimum back is vacuumed. New values for the pg_class columns are
updated every vacuum, as appropiate.

The whole thing is pretty fragile is somebody manually updates a
catalog. But we tell people not to do that, so it should be their
fault, right?

I discovered one problem with the whole approach. Per this patch, we
only store normal Xids in relminxid/relvacuumxid. So if a table is
completely frozen, we will store RecentXmin. We do this because it
would actually be unsafe to store, say, FrozenXid: if another
transaction stores/changes a tuple while we are vacuuming it, the Frozen
mark wouldn't be correct and thus the table could be corrupted if a Xid
wraparound happens (which is why we use RecentXmin in the first place:
to cope with the possibility of someone else using the table during the
vacuum.)

The problem comes when this is done to template1, and it is copied to
another database after some millions of transactions have come and go --
it will seem like the database has suffered wraparound. We would need
to vacuum it completely after copied for the stats to be accurate.

I'm not sure what to do about that. I think storing FrozenXid may not
actually be a totally bad idea. Comments?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachments:

vacuum-minxid-2.patchtext/plain; charset=us-asciiDownload+731-595
#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#8)
Re: [HACKERS] Per-table freeze limit proposal

On Mon, 2005-11-14 at 23:40 -0300, Alvaro Herrera wrote:

The whole thing is pretty fragile is somebody manually updates a
catalog. But we tell people not to do that, so it should be their
fault, right?

Hmmmm...sounds scary. Cool ideas in the patch though.

I discovered one problem with the whole approach. Per this patch, we
only store normal Xids in relminxid/relvacuumxid. So if a table is
completely frozen, we will store RecentXmin. We do this because it
would actually be unsafe to store, say, FrozenXid: if another
transaction stores/changes a tuple while we are vacuuming it, the Frozen
mark wouldn't be correct and thus the table could be corrupted if a Xid
wraparound happens (which is why we use RecentXmin in the first place:
to cope with the possibility of someone else using the table during the
vacuum.)

Yep. And because VACUUM FULL FREEZE is no longer possible.

The problem comes when this is done to template1, and it is copied to
another database after some millions of transactions have come and go --
it will seem like the database has suffered wraparound. We would need
to vacuum it completely after copied for the stats to be accurate.

I don't understand the issue, can you explain more? I see no problem. If
an identical copy gives a problem then surely template1 should also.

I'm not sure what to do about that. I think storing FrozenXid may not
actually be a totally bad idea. Comments?

Its not a totally bad idea, but it has some risk, which where
transactions are concerned is not really acceptable.

Perhaps we should reinstate VACUUM FULL FREEZE to do just a FREEZE with
a table lock and skip all that moving data around.

Best Regards, Simon Riggs

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#9)
Re: [HACKERS] Per-table freeze limit proposal

Simon Riggs wrote:

On Mon, 2005-11-14 at 23:40 -0300, Alvaro Herrera wrote:

The whole thing is pretty fragile is somebody manually updates a
catalog. But we tell people not to do that, so it should be their
fault, right?

Hmmmm...sounds scary. Cool ideas in the patch though.

Yeah, well, actually the problem is solved very easily by setting the
pg_database tuple manually, either to InvalidTransactionId or to the
minimum computed from pg_class.

The problem comes when this is done to template1, and it is copied to
another database after some millions of transactions have come and go --
it will seem like the database has suffered wraparound. We would need
to vacuum it completely after copied for the stats to be accurate.

I don't understand the issue, can you explain more? I see no problem. If
an identical copy gives a problem then surely template1 should also.

Actually, template1 has the problem too. The scenario is this:

- template1 is freezed. datminxid <- X
- a long time passes, say INT_MAX * 0.75 transactions
- a new database D is created, which coming from template1 has datminxid=X
- the Xid counter is past the vacuum horizon for D.datminxid, so the
system determines that the Xid counter could be wrapped already.
- The system automatically decides to stop accepting new transactions.

In fact there's no problem because in D, just like in template1, all
tuples are frozen. How should we mark this on the catalogs? I don't
see any way.

Note that setting relminxid = FrozenTransactionId is bogus in any case,
because even if we correctly lock and freeze the table, the next
transaction after the vacuum could insert a new tuple into the table.
But we don't want INSERT to be checking pg_class.relminxid! (Or do we?)

Now, restating the problem, certainly template1 has the problem too. In
fact we have a bigger problem: we are forcing all tables to be vacuumed
every so often, even if they have been completely frozen before! This
is because setting relminxid = Frozen is really bogus.

I'm not sure what to do about that. I think storing FrozenXid may not
actually be a totally bad idea. Comments?

Its not a totally bad idea, but it has some risk, which where
transactions are concerned is not really acceptable.

Perhaps we should reinstate VACUUM FULL FREEZE to do just a FREEZE with
a table lock and skip all that moving data around.

Doesn't work either because of the argument above.

What about assuming that if somebody executes a database-wide FREEZE, he
knows what he is doing and thus we can mark datminxid as
FrozenTransactionId?

Sadly, I see all this as proof that the whole idea doesn't work. It
seems better than the current state of the system, where we rely on the
user to do certain things, or on pgstat which is inherently inexact.
But there is a big hole in the whole reasoning which hasn't been filled
yet.

Any ideas welcome. The idea of any insert/delete/update operation
checking a bit in the Relation and resetting relminxid to
TopTransactionId if it's marked Frozen is the only one I have right now.
What do people think about it?

--
Alvaro Herrera http://www.PlanetPostgreSQL.org
"No hay cielo posible sin hundir nuestras ra�ces
en la profundidad de la tierra" (Malucha Pinto)

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#10)
Re: [HACKERS] Per-table freeze limit proposal

On Tue, 2005-11-15 at 21:58 -0300, Alvaro Herrera wrote:

In fact there's no problem because in D, just like in template1, all
tuples are frozen. How should we mark this on the catalogs? I don't
see any way.

All tuples might be frozen or might not be, the point is you don't know.
That's why you can't use FrozenTransactionId.

Perhaps we should reinstate VACUUM FULL FREEZE to do just a FREEZE with
a table lock and skip all that moving data around.

Doesn't work either because of the argument above.

What about assuming that if somebody executes a database-wide FREEZE, he
knows what he is doing and thus we can mark datminxid as
FrozenTransactionId?

If you lock the table before FREEZE then you will guarantee that all
rows will be frozen and you really can then set FrozenTransactionId.

Making VACUUM FREEZE take full table locks seems like a very useful
thing to me, and it would solve your problems also.

Best Regards, Simon Riggs

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#11)
Re: [HACKERS] Per-table freeze limit proposal

On Wed, 2005-11-16 at 08:31 +0000, Simon Riggs wrote:

On Tue, 2005-11-15 at 21:58 -0300, Alvaro Herrera wrote:

In fact there's no problem because in D, just like in template1, all
tuples are frozen. How should we mark this on the catalogs? I don't
see any way.

All tuples might be frozen or might not be, the point is you don't know.
That's why you can't use FrozenTransactionId.

Perhaps we should reinstate VACUUM FULL FREEZE to do just a FREEZE with
a table lock and skip all that moving data around.

Doesn't work either because of the argument above.

What about assuming that if somebody executes a database-wide FREEZE, he
knows what he is doing and thus we can mark datminxid as
FrozenTransactionId?

If you lock the table before FREEZE then you will guarantee that all
rows will be frozen and you really can then set FrozenTransactionId.

Making VACUUM FREEZE take full table locks seems like a very useful
thing to me, and it would solve your problems also.

Thinking some more, when initdb issues VACUUM FREEZE we know for certain
that nobody else is issuing commands against the database at that point,
which is equivalent to a table lock. So we should be able to have a
VACUUM FREEZE detect that and if so, set FrozenTransactionId.

In normal concurrent running, I would like VACUUM FREEZE to issue a full
table SHARE lock to ensure that we can set FrozenTransactionId for that
also. Otherwise we will not be able to move frozen tables to read only
media.

Best Regards, Simon Riggs

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#12)
Re: [HACKERS] Per-table freeze limit proposal

Simon Riggs wrote:

On Wed, 2005-11-16 at 08:31 +0000, Simon Riggs wrote:

All tuples might be frozen or might not be, the point is you don't know.
That's why you can't use FrozenTransactionId.

Thinking some more, when initdb issues VACUUM FREEZE we know for certain
that nobody else is issuing commands against the database at that point,
which is equivalent to a table lock. So we should be able to have a
VACUUM FREEZE detect that and if so, set FrozenTransactionId.

In normal concurrent running, I would like VACUUM FREEZE to issue a full
table SHARE lock to ensure that we can set FrozenTransactionId for that
also. Otherwise we will not be able to move frozen tables to read only
media.

You missed one point however. Even if VACUUM FREEZE freezes all tuples,
any transaction following that one is able to insert non-frozen tuples
into the table. At that instant, having marked the table with Frozen is
bogus, no matter what amount of locks you took on it.

We can only do that (mark the table Frozen) if and only if all following
transactions are forced to mark the table "unfrozen" as soon as they
change it. Are we going to do that?

Note that trying to unfreeze a table might be difficult -- we might need
to obtain a lock on pg_class after we have opened and locked the target
relation, leading to possible deadlock on INSERT. Yikes.

--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 17.7", W 73� 14' 26.8"
"Ciencias pol�ticas es la ciencia de entender por qu�
los pol�ticos act�an como lo hacen" (netfunny.com)

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#13)
Re: [HACKERS] Per-table freeze limit proposal

On Wed, 2005-11-16 at 07:52 -0300, Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2005-11-16 at 08:31 +0000, Simon Riggs wrote:

All tuples might be frozen or might not be, the point is you don't know.
That's why you can't use FrozenTransactionId.

Thinking some more, when initdb issues VACUUM FREEZE we know for certain
that nobody else is issuing commands against the database at that point,
which is equivalent to a table lock. So we should be able to have a
VACUUM FREEZE detect that and if so, set FrozenTransactionId.

In normal concurrent running, I would like VACUUM FREEZE to issue a full
table SHARE lock to ensure that we can set FrozenTransactionId for that
also. Otherwise we will not be able to move frozen tables to read only
media.

You missed one point however. Even if VACUUM FREEZE freezes all tuples,
any transaction following that one is able to insert non-frozen tuples
into the table. At that instant, having marked the table with Frozen is
bogus, no matter what amount of locks you took on it.

(OK I think we are getting there now, after my usual comms errors.)

We need something stronger than VACUUM FREEZE then. Perhaps an ALTER
TABLE READONLY. That would do a FREEZE and place a permanent table share
lock, so we wouldn't need to set/unset the Frozen state. We'd do that as
a permissions thing, rather than an actual lock. That way copies of the
data could still be taken with ease and the copies would not themselves
be READONLY.

Not sure what you'd call it to make a whole database readonly all at
once...but whatever we call it we know initdb wants to run it on
template1.

Then your original thought becomes fully viable.

This is particularly important because I see the need to be able to
freeze older data partitions and migrate them to readonly media as part
of very high volume data applications.

Best Regards, Simon Riggs