Min Xid problem proposal
Anybody remembers my patch to allow tracking the minimum Xid present in
a table, allowing to update the freeze xid on a per-table basis? The
motivation behind it was to remove the requirement of database-wide
vacuums.
The problem I found with it was that it required all tables to be
vacuumed at least once every billion transactions, even frozen tables,
because there was the danger that somebody may insert new tuples into
the table without marking that fact in the catalogs (thus minxid would
remain FrozenTransactionId but reality would be different.)
My proposal to solve that problem, is to make any transaction that
inserts or modifies tuples in a table that is marked as frozen, unfreeze
it first. The problem I had last time was finding a good spot in the
code for doing so. I'm now proposing to do it in the parser, in
setTargetTable(). This routine currently opens the target relation and
acquires RowExclusiveLock on it. At this point we can check its
relminxid, and if it's FreezeTransactionId, open pg_class and change the
value there.
The problem with this is that it seems to turn a possibly innocuous
insert into an operation that needs to open pg_class. But in the case
of a relation not in the relcache, it will happen anyway, so it's not
really all that serious. (The assumption here is that an unfreeze event
is at least as unlikely as a cache miss or a cache invalidation.)
The attached patch implements this idea. (Of course it doesn't work
standalone; it requires the rest of my min-xid patch.) I attach it here
separately because it's small and the proposal can be reviewed
independently of the rest of the patch, which is quite bigger.
Note that there's a FIXME on heap_unfreeze() saying that the shared
invalidation would not occur if the transaction aborts. This comment
comes verbatim from vacuum.c's vac_update_relstats(); however I made a
small experiment and it seems to be false. I'm not sure about it, but
it seems to me to be critical to send a invalidation message so that
other backends will notice immediately when we unfreeze a relation.
Comments?
--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La victoria es para quien se atreve a estar solo"
Attachments:
unfreeze.patchtext/plain; charset=us-asciiDownload
Index: catalog/heap.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/heap.c,v
retrieving revision 1.293
diff -c -r1.293 heap.c
*** catalog/heap.c 22 Nov 2005 18:17:08 -0000 1.293
--- catalog/heap.c 9 Dec 2005 16:21:09 -0000
***************
*** 2074,2076 ****
--- 2137,2202 ----
systable_endscan(fkeyScan);
heap_close(fkeyRel, AccessShareLock);
}
+
+ /*
+ * Mark a relation as no longer frozen in pg_class. We violate no-overwrite
+ * semantics here by storing the new minxid value directly into the pg_class
+ * tuple that's already in the page. The reason for this is that our change
+ * must persist even if our transaction aborts.
+ */
+ void
+ heap_unfreeze(Relation rel)
+ {
+ Relation classRel;
+ Form_pg_class classForm;
+ HeapTuple tuple;
+ HeapTupleData rtup;
+ Oid relid = RelationGetRelid(rel);
+ Buffer buffer;
+
+ /*
+ * Throw a warning due to deadlock risk. This is not an error so we have
+ * a chance to correct the bogus state and continue with the original
+ * operation. It's possible to get a deadlock below however.
+ */
+ if (IsSystemRelation(rel))
+ ereport(WARNING,
+ (errcode(ERRCODE_WARNING),
+ errmsg("system catalogs must not be marked as frozen")));
+
+ classRel = heap_open(RelationRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache(RELOID,
+ ObjectIdGetDatum(relid),
+ 0, 0, 0);
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+
+ rtup.t_self = tuple->t_self;
+ ReleaseSysCache(tuple);
+ if (!heap_fetch(classRel, SnapshotNow, &rtup, &buffer, false, NULL))
+ elog(ERROR, "pg_class entry for relid %u vanished during unfreezing",
+ relid);
+
+ /* Ensure no one does this at the same time */
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+ classForm = (Form_pg_class) GETSTRUCT(&rtup);
+ classForm->relminxid = RecentXmin;
+
+ LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * Invalidate the tuple in the catcaches; this also arranges to flush
+ * the relation's relcache entry.
+ *
+ * FIXME -- If we fail to commit for some reason, no flush will occur.
+ * This is a bug.
+ */
+ CacheInvalidateHeapTuple(classRel, &rtup);
+
+ /* Write the buffer */
+ WriteBuffer(buffer);
+
+ heap_close(classRel, RowExclusiveLock);
+ }
Index: parser/parse_clause.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/parser/parse_clause.c,v
retrieving revision 1.144
diff -c -r1.144 parse_clause.c
*** parser/parse_clause.c 22 Nov 2005 18:17:16 -0000 1.144
--- parser/parse_clause.c 9 Dec 2005 15:37:02 -0000
***************
*** 157,162 ****
--- 157,168 ----
pstate->p_target_relation = heap_openrv(relation, RowExclusiveLock);
/*
+ * If the relation is currently frozen, need to unfreeze it before use.
+ */
+ if (pstate->p_target_relation->rd_rel->relminxid == FrozenTransactionId)
+ heap_unfreeze(pstate->p_target_relation);
+
+ /*
* Now build an RTE.
*/
rte = addRangeTableEntryForRelation(pstate, pstate->p_target_relation,
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
My proposal to solve that problem, is to make any transaction that
inserts or modifies tuples in a table that is marked as frozen, unfreeze
it first. The problem I had last time was finding a good spot in the
code for doing so. I'm now proposing to do it in the parser, in
setTargetTable().
My god, no. Do you have any idea how many paths for updates you've missed?
(Think about prepared plans for starters.)
Furthermore, you can't do this in the way you propose (non-WAL-logged
update to pg_class). What if the system crashes without ever having
written this update to disk? The inserted tuples might have made it ---
whether they're committed or not doesn't matter, you've still blown it.
I don't see any very good argument for allowing this mechanism to set
minxid = FrozenXid in the first place. If there are only frozenXid in
the table, set minxid = current XID. That eliminates the entire problem
at a stroke.
(Yes, I know what you are going to say. The idea of freezing a table
and then never having to vacuum it at all is NOT worth the cost of
putting in a mechanism that would guarantee its safety.)
regards, tom lane
Tom Lane wrote:
I don't see any very good argument for allowing this mechanism to set
minxid = FrozenXid in the first place. If there are only frozenXid in
the table, set minxid = current XID. That eliminates the entire problem
at a stroke.
Ok, so I shall go back to the original patch, which did exactly this.
Is it OK for applying?
(I'm using RecentXmin instead of current XID though, because a
currently-running transaction could insert tuples in the table I just
vacuumed.)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
Ok, so I shall go back to the original patch, which did exactly this.
Is it OK for applying?
I haven't looked at it ... when did you post it exactly?
(I'm using RecentXmin instead of current XID though, because a
currently-running transaction could insert tuples in the table I just
vacuumed.)
Duh, right.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Ok, so I shall go back to the original patch, which did exactly this.
Is it OK for applying?I haven't looked at it ... when did you post it exactly?
From: Alvaro Herrera <alvherre@commandprompt.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Patches <pgsql-patches@postgresql.org>
Date: Mon, 14 Nov 2005 23:40:52 -0300
Subject: Re: [PATCHES] [HACKERS] Per-table freeze limit proposal
I have a version that applies cleanly to current CVS tip. Do I post it
again?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes:
I have a version that applies cleanly to current CVS tip. Do I post it
again?
No need unless you think the changes are significant. I'll try to look
over the patch soon.
regards, tom lane
On Fri, 2005-12-09 at 12:32 -0500, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
My proposal to solve that problem, is to make any transaction that
inserts or modifies tuples in a table that is marked as frozen, unfreeze
it first. The problem I had last time was finding a good spot in the
code for doing so. I'm now proposing to do it in the parser, in
setTargetTable().My god, no. Do you have any idea how many paths for updates you've missed?
(Think about prepared plans for starters.)Furthermore, you can't do this in the way you propose (non-WAL-logged
update to pg_class). What if the system crashes without ever having
written this update to disk? The inserted tuples might have made it ---
whether they're committed or not doesn't matter, you've still blown it.I don't see any very good argument for allowing this mechanism to set
minxid = FrozenXid in the first place. If there are only frozenXid in
the table, set minxid = current XID. That eliminates the entire problem
at a stroke.(Yes, I know what you are going to say. The idea of freezing a table
and then never having to vacuum it at all is NOT worth the cost of
putting in a mechanism that would guarantee its safety.)
From what's been said VACUUM FREEZE will not alter the fact that a
frozen table will need vacuuming again in the future and so cannot ever
be read-only. I can't really see any reason to run VACUUM FREEZE...
If you want to make a table read-only forever, we need a separate
command to do that, ISTM.
ALTER TABLE ... READONLY
could set minXid = FrozenTransactionId, indicating no further VACUUMs
required, ever. We can then disallow INSERT/UPDATE/DELETE against the
table in the permissions layer.
Best Regards, Simon Riggs
Simon Riggs wrote:
From what's been said VACUUM FREEZE will not alter the fact that a
frozen table will need vacuuming again in the future and so cannot ever
be read-only. I can't really see any reason to run VACUUM FREEZE...
Yeah.
If you want to make a table read-only forever, we need a separate
command to do that, ISTM.
Let's get this goose cooked and then we can improve it. This patch has
been waiting on my queue for too long.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.