Non-transactional pg_class, try 2

Started by Alvaro Herreraover 19 years ago36 messages

alvherre@alvh.no-ip.org

over 19 years ago

1 attachment(s)

Here I repost the patch to implement non-transactional catalogs, the
first of which is pg_ntclass, intended to hold the non-transactional
info about pg_class (reltuples, relpages).

pg_ntclass is a relation of a new relkind, RELKIND_NON_TRANSACTIONAL
(ideas for shorter names welcome). In pg_class, we store a TID to the
corresponding tuple. The tuples are not cached; they are obtained by
heap_fetch() each time they are requested. This may be worth
reconsideration.

heap_update refuses to operate on a non-transactional catalog, because
there's no (easy) way to update pg_class accordingly. This normally
shouldn't be a problem. vac_update_relstats updates the tuple by using
the new heap_inplace_update call.

VACUUM FULL also refuses to operate on these tables, and ANALYZE
silently skips them. Only plain VACUUM cleans them.

Note that you can DELETE from pg_ntclass. Not sure if we should
disallow it somehow, because it's not easy to get out from that if you
do. (But it's possible -- just insert enough tuples until you reach the
needed TID, and then delete the ones that are not pointed by any
pg_class row).

Regression test pass; I updated the stats test because it was accessing
pg_class.relpages. So there's already a test to verify that it's
working.

There is one caveat that I'm worried about. I had to add a "typedef" to
pg_class.h to put ItemPointerData in FormData_pg_class, because the C
struct doesn't recognize the "tid" type, but the bootstrap type system
does not recognize ItemPointerData as a valid type. I find this mighty
ugly because it will have side effects whenever we #include pg_class.h
(which is virtually anywhere, since that header is #included in htup.h
which in turn is included almost everywhere). Suggestions welcome.
Maybe this is not a problem.

Other two caveats are:
1. During bootstrap, RelationBuildLocalRelation creates nailed relations
with hardcoded TID=(0,1). This is because we don't have access to
pg_class yet, so we can't find the real pointer; and furthermore, we are
going to fix the entries later in the bootstrapping process.

2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty
ugly as well; and autovacuum is skipping pg_ntclass (really all
non-transactional catalogs) altogether. We could improve the situation
by introducing some sort of struct like {relid, relkind}, so that
vacuum_rel could know what relkind to expect, and it could skip
non-transactional catalogs cleanly in vacuum full and analyze.

I intend to apply this patch by tuesday or wednesday, unless an
objection is raised prior to that.

Attachments:

fixclass-3.patchtext/plain; charset=us-asciiDownload

Index: doc/src/sgml/catalogs.sgml
===================================================================
RCS file: /home/alvherre/cvs/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.124
diff -c -r2.124 catalogs.sgml
*** doc/src/sgml/catalogs.sgml	3 Jun 2006 02:53:04 -0000	2.124
--- doc/src/sgml/catalogs.sgml	9 Jun 2006 22:52:50 -0000
***************
*** 104,109 ****
--- 104,114 ----
       </row>
  
       <row>
+       <entry><link linkend="catalog-pg-ntclass"><structname>pg_ntclass</structname></link></entry>
+       <entry>non-transactional columns for <structname>pg_class</structname></entry>
+      </row>
+ 
+      <row>
        <entry><link linkend="catalog-pg-constraint"><structname>pg_constraint</structname></link></entry>
        <entry>check constraints, unique constraints, primary key constraints, foreign key constraints</entry>
       </row>
***************
*** 1465,1497 ****
       </row>
  
       <row>
-       <entry><structfield>relpages</structfield></entry>
-       <entry><type>int4</type></entry>
-       <entry></entry>
-       <entry>
-        Size of the on-disk representation of this table in pages (of size
-        <symbol>BLCKSZ</symbol>).
-        This is only an estimate used by the planner.
-        It is updated by <command>VACUUM</command>,
-        <command>ANALYZE</command>, and a few DDL commands
-        such as <command>CREATE INDEX</command>.
-       </entry>
-      </row>
- 
-      <row>
-       <entry><structfield>reltuples</structfield></entry>
-       <entry><type>float4</type></entry>
-       <entry></entry>
-       <entry>
-        Number of rows in the table.
-        This is only an estimate used by the planner.
-        It is updated by <command>VACUUM</command>,
-        <command>ANALYZE</command>, and a few DDL commands
-        such as <command>CREATE INDEX</command>.
-       </entry>
-      </row>
- 
-      <row>
        <entry><structfield>reltoastrelid</structfield></entry>
        <entry><type>oid</type></entry>
        <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
--- 1470,1475 ----
***************
*** 1512,1517 ****
--- 1490,1504 ----
       </row>
  
       <row>
+       <entry><structfield>relntrans</structfield></entry>
+       <entry><type>tid</type></entry>
+       <entry><literal><link linkend="catalog-pg-ntclass"><structname>pg_ntclass</structname></link>.ctid</literal></entry>
+       <entry>
+        CTID of this relation's pg_ntclass row.
+       </entry>
+      </row>
+ 
+      <row>
        <entry><structfield>relhasindex</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
***************
*** 1538,1547 ****
        <entry><type>char</type></entry>
        <entry></entry>
        <entry>
!        <literal>r</> = ordinary table, <literal>i</> = index,
!        <literal>S</> = sequence, <literal>v</> = view, <literal>c</> =
!        composite type, <literal>t</> = TOAST
!        table
        </entry>
       </row>
  
--- 1525,1533 ----
        <entry><type>char</type></entry>
        <entry></entry>
        <entry>
!        <literal>r</> = ordinary table, <literal>i</> = index, <literal>S</> =
!        sequence, <literal>v</> = view, <literal>c</> = composite type,
!        <literal>t</> = TOAST table, <literal>n</> = non-transactional catalog
        </entry>
       </row>
  
***************
*** 1648,1653 ****
--- 1634,1695 ----
    </table>
   </sect1>
  
+  <sect1 id="catalog-pg-ntclass">
+   <title><structname>pg_ntclass</structname></title>
+ 
+   <indexterm zone="catalog-pg-class">
+    <primary>pg_ntclass</primary>
+   </indexterm>
+ 
+   <para>
+    The catalog <structname>pg_ntclass</structname> stores the non-transactional
+    information about <structname>pg_class</structname> entries, that is, the
+    information that is independent of the commit status of any particular
+    transaction.
+   </para>
+ 
+   <table>
+    <title><structname>pg_ntclass</> Columns</title>
+ 
+    <tgroup cols=4>
+     <thead>
+      <row>
+       <entry>Name</entry>
+       <entry>Type</entry>
+       <entry>References</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+ 
+     <tbody>
+      <row>
+       <entry><structfield>relpages</structfield></entry>
+       <entry><type>int4</type></entry>
+       <entry></entry>
+       <entry>
+        Size of the on-disk representation of this table in pages (of size
+        <symbol>BLCKSZ</symbol>).  This is only an estimate used by the planner.
+        It is updated by <command>VACUUM</command>, <command>ANALYZE</command>,
+        and a few DDL commands such as <command>CREATE INDEX</command>.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>reltuples</structfield></entry>
+       <entry><type>float4</type></entry>
+       <entry></entry>
+       <entry>
+        Number of rows in the table.  This is only an estimate used by the
+        planner.  It is updated by <command>VACUUM</command>,
+        <command>ANALYZE</command>, and a few DDL commands such as
+        <command>CREATE INDEX</command>.
+       </entry>
+      </row>
+     </tbody>
+   </tgroup>
+  </sect1>
+ 
+ 
   <sect1 id="catalog-pg-constraint">
    <title><structname>pg_constraint</structname></title>
  
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.213
diff -c -r1.213 heapam.c
*** src/backend/access/heap/heapam.c	28 May 2006 02:27:08 -0000	1.213
--- src/backend/access/heap/heapam.c	10 Jun 2006 23:17:47 -0000
***************
*** 1828,1833 ****
--- 1828,1839 ----
  
  	Assert(ItemPointerIsValid(otid));
  
+ 	if (relation->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("can't update non-transactional catalog %s",
+ 					   RelationGetRelationName(relation))));
+ 
  	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(otid));
  	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
  
Index: src/backend/catalog/Makefile
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/Makefile,v
retrieving revision 1.59
diff -c -r1.59 Makefile
*** src/backend/catalog/Makefile	12 Feb 2006 03:22:17 -0000	1.59
--- src/backend/catalog/Makefile	10 Jun 2006 23:17:47 -0000
***************
*** 24,33 ****
  
  # Note: there are some undocumented dependencies on the ordering in which
  # the catalog header files are assembled into postgres.bki.  In particular,
! # indexing.h had better be last.
  
  POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
! 	pg_proc.h pg_type.h pg_attribute.h pg_class.h pg_autovacuum.h \
  	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h \
  	pg_operator.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
  	pg_language.h pg_largeobject.h pg_aggregate.h pg_statistic.h \
--- 24,34 ----
  
  # Note: there are some undocumented dependencies on the ordering in which
  # the catalog header files are assembled into postgres.bki.  In particular,
! # indexing.h had better be last, and it's better if the bootstrapped catalogs
! # appear in front.
  
  POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
! 	pg_proc.h pg_type.h pg_attribute.h pg_class.h pg_ntclass.h pg_autovacuum.h \
  	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h \
  	pg_operator.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
  	pg_language.h pg_largeobject.h pg_aggregate.h pg_statistic.h \
Index: src/backend/catalog/heap.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/heap.c,v
retrieving revision 1.299
diff -c -r1.299 heap.c
*** src/backend/catalog/heap.c	10 May 2006 23:18:39 -0000	1.299
--- src/backend/catalog/heap.c	11 Jun 2006 06:46:03 -0000
***************
*** 40,45 ****
--- 40,46 ----
  #include "catalog/pg_constraint.h"
  #include "catalog/pg_inherits.h"
  #include "catalog/pg_namespace.h"
+ #include "catalog/pg_ntclass.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
  #include "commands/tablecmds.h"
***************
*** 66,71 ****
--- 67,73 ----
  					Relation new_rel_desc,
  					Oid new_rel_oid, Oid new_type_oid,
  					Oid relowner,
+ 					ItemPointerData nttid,
  					char relkind);
  static Oid AddNewRelationType(const char *typeName,
  				   Oid typeNamespace,
***************
*** 557,562 ****
--- 559,565 ----
  					Oid new_rel_oid,
  					Oid new_type_oid,
  					Oid relowner,
+ 					ItemPointerData nttid,
  					char relkind)
  {
  	Form_pg_class new_rel_reltup;
***************
*** 568,597 ****
  	 */
  	new_rel_reltup = new_rel_desc->rd_rel;
  
- 	switch (relkind)
- 	{
- 		case RELKIND_RELATION:
- 		case RELKIND_INDEX:
- 		case RELKIND_TOASTVALUE:
- 			/* The relation is real, but as yet empty */
- 			new_rel_reltup->relpages = 0;
- 			new_rel_reltup->reltuples = 0;
- 			break;
- 		case RELKIND_SEQUENCE:
- 			/* Sequences always have a known size */
- 			new_rel_reltup->relpages = 1;
- 			new_rel_reltup->reltuples = 1;
- 			break;
- 		default:
- 			/* Views, etc, have no disk storage */
- 			new_rel_reltup->relpages = 0;
- 			new_rel_reltup->reltuples = 0;
- 			break;
- 	}
- 
  	new_rel_reltup->relowner = relowner;
  	new_rel_reltup->reltype = new_type_oid;
  	new_rel_reltup->relkind = relkind;
  
  	new_rel_desc->rd_att->tdtypeid = new_type_oid;
  
--- 571,580 ----
  	 */
  	new_rel_reltup = new_rel_desc->rd_rel;
  
  	new_rel_reltup->relowner = relowner;
  	new_rel_reltup->reltype = new_type_oid;
  	new_rel_reltup->relkind = relkind;
+ 	new_rel_reltup->relntrans = nttid;
  
  	new_rel_desc->rd_att->tdtypeid = new_type_oid;
  
***************
*** 618,623 ****
--- 601,684 ----
  	heap_freetuple(tup);
  }
  
+ /*
+  * AddNewNtclassTuple
+  *
+  * Insert a pg_ntclass tuple for the given relation, and return its
+  * TID.
+  */
+ ItemPointerData
+ AddNewNtclassTuple(Relation new_rel_desc, char relkind)
+ {
+ 	ItemPointerData	tid;
+ 
+ 	/*
+ 	 * We don't create pg_ntclass entries for sequences, views or composite
+ 	 * types, because they don't have variable-size storage and thus they
+ 	 * don't need it.
+ 	 */
+ 	if (relkind == RELKIND_VIEW ||
+ 		relkind == RELKIND_COMPOSITE_TYPE ||
+ 		relkind == RELKIND_SEQUENCE)
+ 	{
+ 		ItemPointerSetInvalid(&tid);
+ 	}
+ 	else
+ 	{
+ 		Relation	ntRel;
+ 		Form_pg_ntclass new_rel_nttup;
+ 		HeapTuple	tup;
+ 
+ 		ntRel = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 		new_rel_nttup = (Form_pg_ntclass) palloc(sizeof(FormData_pg_ntclass));
+ 
+ 		/* choose the appropiate new values we want to insert */
+ 		switch (relkind)
+ 		{
+ 			case RELKIND_RELATION:
+ 			case RELKIND_INDEX:
+ 			case RELKIND_TOASTVALUE:
+ 			case RELKIND_NON_TRANSACTIONAL:
+ 				/* The relation is real, but as yet empty */
+ 				new_rel_nttup->relpages = 0;
+ 				new_rel_nttup->reltuples = 0;
+ 				break;
+ 			default:
+ 				elog(ERROR, "can't create pg_ntclass entries for relkind '%c'",
+ 					 relkind);
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * now form a tuple to add to pg_ntclass
+ 		 */
+ 		tup = heap_addheader(Natts_pg_ntclass,
+ 							 false,
+ 							 NTCLASS_TUPLE_SIZE,
+ 							 (void *) new_rel_nttup);
+ 
+ 		/*
+ 		 * finally insert the new tuple, update the indexes, and clean up.
+ 		 */
+ 		simple_heap_insert(ntRel, tup);
+ 
+ 		/* Grab the newly inserted tuple's TID for saving it into pg_class */
+ 		tid = tup->t_self;
+ 
+ #ifdef NOT_USED
+ 		/* pg_ntclass does not have indexes */
+ 		CatalogUpdateIndexes(ntRel, tup);
+ #endif
+ 
+ 		heap_close(ntRel, RowExclusiveLock);
+ 
+ 		heap_freetuple(tup);
+ 		pfree(new_rel_nttup);
+ 	}
+ 
+ 	return tid;
+ }
  
  /* --------------------------------
   *		AddNewRelationType -
***************
*** 679,684 ****
--- 740,747 ----
  	Relation	pg_class_desc;
  	Relation	new_rel_desc;
  	Oid			new_type_oid;
+ 	ItemPointerData nttid;
+ 
  
  	pg_class_desc = heap_open(RelationRelationId, RowExclusiveLock);
  
***************
*** 733,738 ****
--- 796,806 ----
  									  relkind);
  
  	/*
+ 	 * Create a pg_ntclass entry for the relation.
+ 	 */
+ 	nttid = AddNewNtclassTuple(new_rel_desc, relkind);
+ 
+ 	/*
  	 * now create an entry in pg_class for the relation.
  	 *
  	 * NOTE: we could get a unique-index failure here, in case someone else is
***************
*** 744,749 ****
--- 812,818 ----
  						relid,
  						new_type_oid,
  						ownerid,
+ 						nttid,
  						relkind);
  
  	/*
***************
*** 872,877 ****
--- 941,962 ----
  	heap_close(pg_class_desc, RowExclusiveLock);
  }
  
+ void
+ DeleteNtclassTuple(ItemPointer tid)
+ {
+ 	Relation	pg_ntclass;
+ 
+ 	/* some relations don't have pg_ntclass tuples */
+ 	if (!ItemPointerIsValid(tid))
+ 		return;
+ 
+ 	pg_ntclass = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	simple_heap_delete(pg_ntclass, tid);
+ 
+ 	heap_close(pg_ntclass, RowExclusiveLock);
+ }
+ 
  /*
   *		DeleteAttributeTuples
   *
***************
*** 1160,1165 ****
--- 1245,1255 ----
  	}
  
  	/*
+ 	 * Delete the pg_ntclass tuple
+ 	 */
+ 	DeleteNtclassTuple(&(rel->rd_rel->relntrans));
+ 
+ 	/*
  	 * Close relcache entry, but *keep* AccessExclusiveLock on the relation
  	 * until transaction commit.  This ensures no one else will try to do
  	 * something with the doomed relation.
***************
*** 2133,2135 ****
--- 2223,2284 ----
  
  	return result;
  }
+ 
+ /*
+  * Return the number of tuples and pages for a given relation, as recorded in
+  * its pg_ntclass entry.  ntuples and npages are output parameters, and can be
+  * NULL if the caller does not need them.
+  */
+ void
+ relation_get_ntups_npages(Relation relation, float4 *ntuples, int4 *npages)
+ {
+ 	HeapTuple	tup;
+ 	Relation	pg_ntclass;
+ 	Form_pg_ntclass ntForm;
+ 
+ 	pg_ntclass = heap_open(NtClassRelationId, AccessShareLock);
+ 
+ 	tup = get_ntclass_entry(pg_ntclass, relation);
+ 	ntForm = (Form_pg_ntclass) GETSTRUCT(tup);
+ 
+ 	if (ntuples != NULL)
+ 		*ntuples = ntForm->reltuples;
+ 	if (npages != NULL)
+ 		*npages = ntForm->relpages;
+ 
+ 	heap_freetuple(tup);
+ 	heap_close(pg_ntclass, AccessShareLock);
+ }
+ 
+ /*
+  * Return a copy of the pg_ntclass tuple for the given target relation.
+  */
+ HeapTuple
+ get_ntclass_entry(Relation pg_ntclass, Relation target)
+ {
+ 	HeapTupleData tuple;
+ 	HeapTuple	retval;
+ 	Buffer		buffer;
+ 
+ 	Assert(ItemPointerIsValid(&(target->rd_rel->relntrans)));
+ 
+ 	tuple.t_self = target->rd_rel->relntrans;
+ 
+ 	if (!heap_fetch(pg_ntclass, SnapshotNow, &tuple, &buffer, false, NULL))
+ 		elog(ERROR, "lookup failed for pg_ntclass entry for relation %s",
+ 			 RelationGetRelationName(target));
+ 
+ 	retval = heap_copytuple(&tuple);
+ 	ReleaseBuffer(buffer);
+ 
+ 	return retval;
+ }
+ 
+ void
+ update_ntclass_entry(Relation pg_ntclass, HeapTuple tuple)
+ {
+ 	elog(DEBUG2, "updating pg_ntclass entry (%u/%u)",
+ 		 ItemPointerGetBlockNumber(&(tuple->t_self)),
+ 		 ItemPointerGetOffsetNumber(&(tuple->t_self)));
+ 	heap_inplace_update(pg_ntclass, tuple);
+ }
Index: src/backend/catalog/index.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.266
diff -c -r1.266 index.c
*** src/backend/catalog/index.c	10 May 2006 23:18:39 -0000	1.266
--- src/backend/catalog/index.c	10 Jun 2006 23:17:47 -0000
***************
*** 32,37 ****
--- 32,38 ----
  #include "catalog/index.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_constraint.h"
+ #include "catalog/pg_ntclass.h"
  #include "catalog/pg_opclass.h"
  #include "catalog/pg_type.h"
  #include "executor/executor.h"
***************
*** 569,575 ****
  
  	/*
  	 * Fill in fields of the index's pg_class entry that are not set correctly
! 	 * by heap_create.
  	 *
  	 * XXX should have a cleaner way to create cataloged indexes
  	 */
--- 570,576 ----
  
  	/*
  	 * Fill in fields of the index's pg_class entry that are not set correctly
! 	 * by heap_create, and create the pg_ntclass entry.
  	 *
  	 * XXX should have a cleaner way to create cataloged indexes
  	 */
***************
*** 577,582 ****
--- 578,585 ----
  	indexRelation->rd_rel->relam = accessMethodObjectId;
  	indexRelation->rd_rel->relkind = RELKIND_INDEX;
  	indexRelation->rd_rel->relhasoids = false;
+ 	indexRelation->rd_rel->relntrans =
+ 		AddNewNtclassTuple(indexRelation, RELKIND_INDEX);
  
  	/*
  	 * store index's pg_class entry
***************
*** 756,761 ****
--- 759,768 ----
  	}
  	else if (skip_build)
  	{
+ 		float4 ntuples;
+ 
+ 		relation_get_ntups_npages(heapRelation, &ntuples, NULL);
+ 
  		/*
  		 * Caller is responsible for filling the index later on.  However,
  		 * we'd better make sure that the heap relation is correctly marked
***************
*** 765,771 ****
  						   true,
  						   isprimary,
  						   InvalidOid,
! 						   heapRelation->rd_rel->reltuples);
  		/* Make the above update visible */
  		CommandCounterIncrement();
  	}
--- 772,778 ----
  						   true,
  						   isprimary,
  						   InvalidOid,
! 						   ntuples);
  		/* Make the above update visible */
  		CommandCounterIncrement();
  	}
***************
*** 826,831 ****
--- 833,841 ----
  	smgrscheduleunlink(userIndexRelation->rd_smgr,
  					   userIndexRelation->rd_istemp);
  
+ 	/* delete pg_ntclass tuple */
+ 	DeleteNtclassTuple(&(userIndexRelation->rd_rel->relntrans));
+ 
  	/*
  	 * Close and flush the index's relcache entry, to ensure relcache doesn't
  	 * try to rebuild it while we're deleting catalog entries. We keep the
***************
*** 1038,1045 ****
--- 1048,1057 ----
  	BlockNumber	relpages = RelationGetNumberOfBlocks(rel);
  	Oid			relid = RelationGetRelid(rel);
  	Relation	pg_class;
+ 	Relation	pg_ntclass;
  	HeapTuple	tuple;
  	Form_pg_class rd_rel;
+ 	Form_pg_ntclass ntForm;
  	bool		in_place_upd;
  	bool		dirty;
  
***************
*** 1112,1127 ****
  			dirty = true;
  		}
  	}
- 	if (rd_rel->reltuples != (float4) reltuples)
- 	{
- 		rd_rel->reltuples = (float4) reltuples;
- 		dirty = true;
- 	}
- 	if (rd_rel->relpages != (int32) relpages)
- 	{
- 		rd_rel->relpages = (int32) relpages;
- 		dirty = true;
- 	}
  
  	/*
  	 * If anything changed, write out the tuple
--- 1124,1129 ----
***************
*** 1185,1190 ****
--- 1187,1216 ----
  	heap_freetuple(tuple);
  
  	heap_close(pg_class, RowExclusiveLock);
+ 
+ 	/* update the non-transactional (pg_ntclass) entry */
+ 	pg_ntclass = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	tuple = get_ntclass_entry(pg_ntclass, rel);
+ 
+ 	ntForm = (Form_pg_ntclass) GETSTRUCT(tuple);
+ 
+ 	dirty = false;
+ 	if (ntForm->reltuples != (float4) reltuples)
+ 	{
+ 		ntForm->reltuples = (float4) reltuples;
+ 		dirty = true;
+ 	}
+ 	if (ntForm->relpages != (int32) relpages)
+ 	{
+ 		ntForm->relpages = (int32) relpages;
+ 		dirty = true;
+ 	}
+ 	if (dirty)
+ 		update_ntclass_entry(pg_ntclass, tuple);
+ 	heap_freetuple(tuple);
+ 
+ 	heap_close(pg_ntclass, RowExclusiveLock);
  }
  
  /*
***************
*** 1240,1249 ****
  	RelationOpenSmgr(relation);
  	smgrscheduleunlink(relation->rd_smgr, relation->rd_istemp);
  
! 	/* update the pg_class row */
  	rd_rel->relfilenode = newrelfilenode;
- 	rd_rel->relpages = 0;		/* it's empty until further notice */
- 	rd_rel->reltuples = 0;
  	simple_heap_update(pg_class, &tuple->t_self, tuple);
  	CatalogUpdateIndexes(pg_class, tuple);
  
--- 1266,1278 ----
  	RelationOpenSmgr(relation);
  	smgrscheduleunlink(relation->rd_smgr, relation->rd_istemp);
  
! 	/*
! 	 * Update the pg_class row.  The caller is responsible for ensuring the
! 	 * pg_ntclass entry is correct.  While we could update it here, it's just a
! 	 * waste of time to set the values to 0, because currently all callers set
! 	 * them to the correct values sometime after calling us.
! 	 */
  	rd_rel->relfilenode = newrelfilenode;
  	simple_heap_update(pg_class, &tuple->t_self, tuple);
  	CatalogUpdateIndexes(pg_class, tuple);
  
Index: src/backend/commands/analyze.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.93
diff -c -r1.93 analyze.c
*** src/backend/commands/analyze.c	23 Mar 2006 00:19:28 -0000	1.93
--- src/backend/commands/analyze.c	10 Jun 2006 23:17:47 -0000
***************
*** 164,171 ****
  	 */
  	if (onerel->rd_rel->relkind != RELKIND_RELATION)
  	{
! 		/* No need for a WARNING if we already complained during VACUUM */
! 		if (!vacstmt->vacuum)
  			ereport(WARNING,
  					(errmsg("skipping \"%s\" --- cannot analyze indexes, views, or special system tables",
  							RelationGetRelationName(onerel))));
--- 164,176 ----
  	 */
  	if (onerel->rd_rel->relkind != RELKIND_RELATION)
  	{
! 		/*
! 		 * No need for a WARNING if we already complained during VACUUM.  Also
! 		 * don't complain if this is a non-transactional catalog, because
! 		 * VACUUM processes it but we don't.
! 		 */
! 		if (!vacstmt->vacuum &&
! 			(onerel->rd_rel->relkind != RELKIND_NON_TRANSACTIONAL))
  			ereport(WARNING,
  					(errmsg("skipping \"%s\" --- cannot analyze indexes, views, or special system tables",
  							RelationGetRelationName(onerel))));
***************
*** 422,428 ****
  	 */
  	if (!vacstmt->vacuum)
  	{
! 		vac_update_relstats(RelationGetRelid(onerel),
  							RelationGetNumberOfBlocks(onerel),
  							totalrows,
  							hasindex);
--- 427,433 ----
  	 */
  	if (!vacstmt->vacuum)
  	{
! 		vac_update_relstats(onerel,
  							RelationGetNumberOfBlocks(onerel),
  							totalrows,
  							hasindex);
***************
*** 432,438 ****
  			double		totalindexrows;
  
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
! 			vac_update_relstats(RelationGetRelid(Irel[ind]),
  								RelationGetNumberOfBlocks(Irel[ind]),
  								totalindexrows,
  								false);
--- 437,443 ----
  			double		totalindexrows;
  
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
! 			vac_update_relstats(Irel[ind],
  								RelationGetNumberOfBlocks(Irel[ind]),
  								totalindexrows,
  								false);
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.147
diff -c -r1.147 cluster.c
*** src/backend/commands/cluster.c	2 May 2006 22:25:10 -0000	1.147
--- src/backend/commands/cluster.c	10 Jun 2006 23:17:47 -0000
***************
*** 728,733 ****
--- 728,734 ----
  				relform2;
  	Oid			swaptemp;
  	CatalogIndexState indstate;
+ 	ItemPointerData swap_tid;
  
  	/* We need writable copies of both pg_class tuples. */
  	relRelation = heap_open(RelationRelationId, RowExclusiveLock);
***************
*** 763,781 ****
  
  	/* we should not swap reltoastidxid */
  
! 	/* swap size statistics too, since new rel has freshly-updated stats */
! 	{
! 		int4		swap_pages;
! 		float4		swap_tuples;
! 
! 		swap_pages = relform1->relpages;
! 		relform1->relpages = relform2->relpages;
! 		relform2->relpages = swap_pages;
! 
! 		swap_tuples = relform1->reltuples;
! 		relform1->reltuples = relform2->reltuples;
! 		relform2->reltuples = swap_tuples;
! 	}
  
  	/* Update the tuples in pg_class */
  	simple_heap_update(relRelation, &reltup1->t_self, reltup1);
--- 764,779 ----
  
  	/* we should not swap reltoastidxid */
  
! 	/*
! 	 * Swap pg_ntclass rows too.  This automatically updates the
! 	 * size statistics.
! 	 *
! 	 * XXX make sure this is still correct when adding more rows to
! 	 * pg_ntclass.
! 	 */
! 	swap_tid = relform1->relntrans;
! 	relform1->relntrans = relform2->relntrans;
! 	relform2->relntrans = swap_tid;
  
  	/* Update the tuples in pg_class */
  	simple_heap_update(relRelation, &reltup1->t_self, reltup1);
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.330
diff -c -r1.330 vacuum.c
*** src/backend/commands/vacuum.c	10 May 2006 23:18:39 -0000	1.330
--- src/backend/commands/vacuum.c	10 Jun 2006 23:17:47 -0000
***************
*** 28,37 ****
--- 28,39 ----
  #include "access/subtrans.h"
  #include "access/xlog.h"
  #include "catalog/catalog.h"
+ #include "catalog/heap.h"
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_index.h"
+ #include "catalog/pg_ntclass.h"
  #include "commands/dbcommands.h"
  #include "commands/vacuum.h"
  #include "executor/executor.h"
***************
*** 200,212 ****
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
! 			 const char *stmttype);
  static void vac_update_dbstats(Oid dbid,
  				   TransactionId vacuumXID,
  				   TransactionId frozenXID);
  static void vac_truncate_clog(TransactionId vacuumXID,
  				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
  		  VacPageList vacuum_pages, VacPageList fraged_pages);
--- 202,214 ----
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
! 			 const char *stmttype, bool full);
  static void vac_update_dbstats(Oid dbid,
  				   TransactionId vacuumXID,
  				   TransactionId frozenXID);
  static void vac_truncate_clog(TransactionId vacuumXID,
  				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
  		  VacPageList vacuum_pages, VacPageList fraged_pages);
***************
*** 350,356 ****
  	 * Build list of relations to process, unless caller gave us one. (If we
  	 * build one, we put it in vac_context for safekeeping.)
  	 */
! 	relations = get_rel_oids(relids, vacstmt->relation, stmttype);
  
  	if (vacstmt->vacuum && all_rels)
  	{
--- 352,359 ----
  	 * Build list of relations to process, unless caller gave us one. (If we
  	 * build one, we put it in vac_context for safekeeping.)
  	 */
! 	relations = get_rel_oids(relids, vacstmt->relation, stmttype,
! 		   					 vacstmt->full);
  
  	if (vacstmt->vacuum && all_rels)
  	{
***************
*** 446,452 ****
  
  			if (vacstmt->vacuum)
  			{
! 				if (!vacuum_rel(relid, vacstmt, RELKIND_RELATION))
  					all_rels = false;	/* forget about updating dbstats */
  			}
  			if (vacstmt->analyze)
--- 449,455 ----
  
  			if (vacstmt->vacuum)
  			{
! 				if (!vacuum_rel(relid, vacstmt, false))
  					all_rels = false;	/* forget about updating dbstats */
  			}
  			if (vacstmt->analyze)
***************
*** 563,569 ****
   * per-relation transactions.
   */
  static List *
! get_rel_oids(List *relids, const RangeVar *vacrel, const char *stmttype)
  {
  	List	   *oid_list = NIL;
  	MemoryContext oldcontext;
--- 566,573 ----
   * per-relation transactions.
   */
  static List *
! get_rel_oids(List *relids, const RangeVar *vacrel, const char *stmttype,
! 			 bool full)
  {
  	List	   *oid_list = NIL;
  	MemoryContext oldcontext;
***************
*** 590,608 ****
  		Relation	pgclass;
  		HeapScanDesc scan;
  		HeapTuple	tuple;
- 		ScanKeyData key;
- 
- 		ScanKeyInit(&key,
- 					Anum_pg_class_relkind,
- 					BTEqualStrategyNumber, F_CHAREQ,
- 					CharGetDatum(RELKIND_RELATION));
  
  		pgclass = heap_open(RelationRelationId, AccessShareLock);
  
! 		scan = heap_beginscan(pgclass, SnapshotNow, 1, &key);
  
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
  			/* Make a relation list entry for this guy */
  			oldcontext = MemoryContextSwitchTo(vac_context);
  			oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple));
--- 594,620 ----
  		Relation	pgclass;
  		HeapScanDesc scan;
  		HeapTuple	tuple;
  
  		pgclass = heap_open(RelationRelationId, AccessShareLock);
  
! 		scan = heap_beginscan(pgclass, SnapshotNow, 0, NULL);
  
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
+ 			Form_pg_class classForm;
+ 			char		relkind;
+ 
+ 			classForm = (Form_pg_class) GETSTRUCT(tuple);
+ 			relkind = classForm->relkind;
+ 
+ 			/* we only process plain and non-transactional tables */
+ 			if (relkind != RELKIND_RELATION &&
+ 				relkind != RELKIND_NON_TRANSACTIONAL)
+ 				continue;
+ 			/* we only process non-transactional catalogs on lazy vacuum */
+ 			if (relkind == RELKIND_NON_TRANSACTIONAL && full)
+ 				continue;
+ 
  			/* Make a relation list entry for this guy */
  			oldcontext = MemoryContextSwitchTo(vac_context);
  			oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple));
***************
*** 669,677 ****
   *	vac_update_relstats() -- update statistics for one relation
   *
   *		Update the whole-relation statistics that are kept in its pg_class
!  *		row.  There are additional stats that will be updated if we are
!  *		doing ANALYZE, but we always update these stats.  This routine works
!  *		for both index and heap relation entries in pg_class.
   *
   *		We violate transaction semantics here by overwriting the rel's
   *		existing pg_class tuple with the new values.  This is reasonably
--- 681,689 ----
   *	vac_update_relstats() -- update statistics for one relation
   *
   *		Update the whole-relation statistics that are kept in its pg_class
!  *		and pg_ntclass rows.  There are additional stats that will be updated
!  *		if we are doing ANALYZE, but we always update these stats.  This
!  *		routine works for both index and heap relation entries in pg_class.
   *
   *		We violate transaction semantics here by overwriting the rel's
   *		existing pg_class tuple with the new values.  This is reasonably
***************
*** 679,723 ****
   *		commits.  The reason for this is that if we updated these tuples in
   *		the usual way, vacuuming pg_class itself wouldn't work very well ---
   *		by the time we got done with a vacuum cycle, most of the tuples in
!  *		pg_class would've been obsoleted.  Of course, this only works for
!  *		fixed-size never-null columns, but these are.
   *
   *		This routine is shared by full VACUUM, lazy VACUUM, and stand-alone
   *		ANALYZE.
   */
  void
! vac_update_relstats(Oid relid, BlockNumber num_pages, double num_tuples,
  					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
  	Form_pg_class pgcform;
  	bool		dirty;
  
  	rd = heap_open(RelationRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
  	ctup = SearchSysCacheCopy(RELOID,
! 							  ObjectIdGetDatum(relid),
  							  0, 0, 0);
  	if (!HeapTupleIsValid(ctup))
  		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 			 relid);
  	pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
  	/* Apply required updates, if any, to copied tuple */
  
  	dirty = false;
- 	if (pgcform->relpages != (int32) num_pages)
- 	{
- 		pgcform->relpages = (int32) num_pages;
- 		dirty = true;
- 	}
- 	if (pgcform->reltuples != (float4) num_tuples)
- 	{
- 		pgcform->reltuples = (float4) num_tuples;
- 		dirty = true;
- 	}
  	if (pgcform->relhasindex != hasindex)
  	{
  		pgcform->relhasindex = hasindex;
--- 691,758 ----
   *		commits.  The reason for this is that if we updated these tuples in
   *		the usual way, vacuuming pg_class itself wouldn't work very well ---
   *		by the time we got done with a vacuum cycle, most of the tuples in
!  *		them would've been obsoleted.  Of course, this only works for
!  *		fixed-size never-null columns, but these are.  Also, note this doesn't
!  *		apply to pg_ntclass, because by definition of non-transactional
!  *		catalog we must always update it in-place.
   *
   *		This routine is shared by full VACUUM, lazy VACUUM, and stand-alone
   *		ANALYZE.
   */
  void
! vac_update_relstats(Relation rel, BlockNumber num_pages, double num_tuples,
  					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
+ 	Form_pg_ntclass pgntform;
  	Form_pg_class pgcform;
  	bool		dirty;
  
+ 	/* Process pg_ntclass first */
+ 	rd = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	/* Fetch a copy of the pg_ntclass tuple to scribble on */
+ 	ctup = get_ntclass_entry(rd, rel);
+ 
+ 	pgntform = (Form_pg_ntclass) GETSTRUCT(ctup);
+ 
+ 	/* Apply required updates, if any, to copied tuple */
+ 
+ 	dirty = false;
+ 	if (pgntform->relpages != (int32) num_pages)
+ 	{
+ 		pgntform->relpages = (int32) num_pages;
+ 		dirty = true;
+ 	}
+ 	if (pgntform->reltuples != (float4) num_tuples)
+ 	{
+ 		pgntform->reltuples = (float4) num_tuples;
+ 		dirty = true;
+ 	}
+ 
+ 	/* If anything changed, write out the tuple */
+ 	if (dirty)
+ 		update_ntclass_entry(rd, ctup);
+ 	heap_freetuple(ctup);
+ 
+ 	heap_close(rd, RowExclusiveLock);
+ 
+ 	/* Now go for the pg_class tuple */
  	rd = heap_open(RelationRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
  	ctup = SearchSysCacheCopy(RELOID,
! 							  ObjectIdGetDatum(RelationGetRelid(rel)),
  							  0, 0, 0);
  	if (!HeapTupleIsValid(ctup))
  		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 			 RelationGetRelid(rel));
  	pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
  	/* Apply required updates, if any, to copied tuple */
  
  	dirty = false;
  	if (pgcform->relhasindex != hasindex)
  	{
  		pgcform->relhasindex = hasindex;
***************
*** 934,940 ****
   *		At entry and exit, we are not inside a transaction.
   */
  static bool
! vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
--- 969,975 ----
   *		At entry and exit, we are not inside a transaction.
   */
  static bool
! vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
***************
*** 1005,1013 ****
  
  	/*
  	 * Check that it's a plain table; we used to do this in get_rel_oids() but
! 	 * seems safer to check after we've locked the relation.
  	 */
! 	if (onerel->rd_rel->relkind != expected_relkind)
  	{
  		ereport(WARNING,
  				(errmsg("skipping \"%s\" --- cannot vacuum indexes, views, or special system tables",
--- 1040,1051 ----
  
  	/*
  	 * Check that it's a plain table; we used to do this in get_rel_oids() but
! 	 * seems safer to check after we've locked the relation.  We only accept
! 	 * TOAST tables if the caller told us it's OK to do so.
  	 */
! 	if (!(onerel->rd_rel->relkind == RELKIND_RELATION ||
! 		  onerel->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL ||
! 		  (onerel->rd_rel->relkind == RELKIND_TOASTVALUE && allow_toast)))
  	{
  		ereport(WARNING,
  				(errmsg("skipping \"%s\" --- cannot vacuum indexes, views, or special system tables",
***************
*** 1019,1024 ****
--- 1057,1078 ----
  	}
  
  	/*
+ 	 * We only process non-transactional catalogs in lazy vacuum, because they
+ 	 * are referenced by TID and applying VACUUM FULL to them would destroy the
+ 	 * references.
+ 	 */
+ 	if (onerel->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL && vacstmt->full)
+ 	{
+ 		ereport(WARNING,
+ 				(errmsg("skipping \"%s\" --- cannot vacuum non-transactional catalogs in VACUUM FULL",
+ 						RelationGetRelationName(onerel))));
+ 		relation_close(onerel, lmode);
+ 		StrategyHintVacuum(false);
+ 		CommitTransactionCommand();
+ 		return false;
+ 	}
+ 
+ 	/*
  	 * Silently ignore tables that are temp tables of other backends ---
  	 * trying to vacuum these will lead to great unhappiness, since their
  	 * contents are probably not up-to-date on disk.  (We don't throw a
***************
*** 1079,1085 ****
  	 */
  	if (toast_relid != InvalidOid)
  	{
! 		if (!vacuum_rel(toast_relid, vacstmt, RELKIND_TOASTVALUE))
  			result = false;		/* failed to vacuum the TOAST table? */
  	}
  
--- 1133,1139 ----
  	 */
  	if (toast_relid != InvalidOid)
  	{
! 		if (!vacuum_rel(toast_relid, vacstmt, true))
  			result = false;		/* failed to vacuum the TOAST table? */
  	}
  
***************
*** 1179,1185 ****
  	vac_update_fsm(onerel, &fraged_pages, vacrelstats->rel_pages);
  
  	/* update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel), vacrelstats->rel_pages,
  						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
--- 1233,1239 ----
  	vac_update_fsm(onerel, &fraged_pages, vacrelstats->rel_pages);
  
  	/* update statistics in pg_class */
! 	vac_update_relstats(onerel, vacrelstats->rel_pages,
  						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
***************
*** 2938,2944 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
--- 2992,2998 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
***************
*** 3007,3013 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
--- 3061,3067 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.70
diff -c -r1.70 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c	2 May 2006 22:25:10 -0000	1.70
--- src/backend/commands/vacuumlazy.c	10 Jun 2006 23:17:47 -0000
***************
*** 175,181 ****
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel),
  						vacrelstats->rel_pages,
  						vacrelstats->rel_tuples,
  						hasindex);
--- 175,181 ----
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(onerel,
  						vacrelstats->rel_pages,
  						vacrelstats->rel_tuples,
  						hasindex);
***************
*** 667,673 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages,
  						stats->num_index_tuples,
  						false);
--- 667,673 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages,
  						stats->num_index_tuples,
  						false);
Index: src/backend/optimizer/util/plancat.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/optimizer/util/plancat.c,v
retrieving revision 1.119
diff -c -r1.119 plancat.c
*** src/backend/optimizer/util/plancat.c	5 Mar 2006 15:58:31 -0000	1.119
--- src/backend/optimizer/util/plancat.c	10 Jun 2006 23:17:47 -0000
***************
*** 19,24 ****
--- 19,25 ----
  
  #include "access/genam.h"
  #include "access/heapam.h"
+ #include "catalog/heap.h"
  #include "catalog/pg_amop.h"
  #include "catalog/pg_inherits.h"
  #include "catalog/pg_index.h"
***************
*** 235,240 ****
--- 236,243 ----
  	BlockNumber relpages;
  	double		reltuples;
  	double		density;
+ 	int4		npages;
+ 	float4		ntuples;
  
  	switch (rel->rd_rel->relkind)
  	{
***************
*** 263,269 ****
  			 * world, and it shouldn't degrade the quality of the plan too
  			 * much anyway to err in this direction.
  			 */
! 			if (curpages < 10 && rel->rd_rel->relpages == 0)
  				curpages = 10;
  
  			/* report estimated # pages */
--- 266,273 ----
  			 * world, and it shouldn't degrade the quality of the plan too
  			 * much anyway to err in this direction.
  			 */
! 			relation_get_ntups_npages(rel, &ntuples, &npages);
! 			if (curpages < 10 && ntuples == 0)
  				curpages = 10;
  
  			/* report estimated # pages */
***************
*** 275,282 ****
  				break;
  			}
  			/* coerce values in pg_class to more desirable types */
! 			relpages = (BlockNumber) rel->rd_rel->relpages;
! 			reltuples = (double) rel->rd_rel->reltuples;
  
  			/*
  			 * If it's an index, discount the metapage.  This is a kluge
--- 279,286 ----
  				break;
  			}
  			/* coerce values in pg_class to more desirable types */
! 			relpages = (BlockNumber) npages;
! 			reltuples = (double) ntuples;
  
  			/*
  			 * If it's an index, discount the metapage.  This is a kluge
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.19
diff -c -r1.19 autovacuum.c
*** src/backend/postmaster/autovacuum.c	19 May 2006 15:15:37 -0000	1.19
--- src/backend/postmaster/autovacuum.c	10 Jun 2006 23:17:47 -0000
***************
*** 24,29 ****
--- 24,30 ----
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
+ #include "catalog/heap.h"
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
  #include "catalog/pg_autovacuum.h"
***************
*** 615,623 ****
  		ScanKeyData entry[1];
  		Oid			relid;
  
! 		/* Consider only regular and toast tables. */
  		if (classForm->relkind != RELKIND_RELATION &&
! 			classForm->relkind != RELKIND_TOASTVALUE)
  			continue;
  
  		/*
--- 616,625 ----
  		ScanKeyData entry[1];
  		Oid			relid;
  
! 		/* Consider only regular, non-transactional and toast tables. */
  		if (classForm->relkind != RELKIND_RELATION &&
! 			classForm->relkind != RELKIND_TOASTVALUE &&
! 			classForm->relkind != RELKIND_NON_TRANSACTIONAL)
  			continue;
  
  		/*
***************
*** 734,740 ****
  					 List **toast_table_ids)
  {
  	Relation	rel;
! 	float4		reltuples;		/* pg_class.reltuples */
  
  	/* constants from pg_autovacuum or GUC variables */
  	int			vac_base_thresh,
--- 736,742 ----
  					 List **toast_table_ids)
  {
  	Relation	rel;
! 	float4		reltuples;		/* pg_ntclass.reltuples */
  
  	/* constants from pg_autovacuum or GUC variables */
  	int			vac_base_thresh,
***************
*** 773,779 ****
  	if (!PointerIsValid(rel))
  		return;
  
! 	reltuples = rel->rd_rel->reltuples;
  	vactuples = tabentry->n_dead_tuples;
  	anltuples = tabentry->n_live_tuples + tabentry->n_dead_tuples -
  		tabentry->last_anl_tuples;
--- 775,781 ----
  	if (!PointerIsValid(rel))
  		return;
  
! 	relation_get_ntups_npages(rel, &reltuples, NULL);
  	vactuples = tabentry->n_dead_tuples;
  	anltuples = tabentry->n_live_tuples + tabentry->n_dead_tuples -
  		tabentry->last_anl_tuples;
***************
*** 843,849 ****
  
  	Assert(CurrentMemoryContext == AutovacMemCxt);
  
! 	if (classForm->relkind == RELKIND_RELATION)
  	{
  		if (dovacuum || doanalyze)
  			elog(DEBUG2, "autovac: will%s%s %s",
--- 845,852 ----
  
  	Assert(CurrentMemoryContext == AutovacMemCxt);
  
! 	if (classForm->relkind == RELKIND_RELATION ||
! 		classForm->relkind == RELKIND_NON_TRANSACTIONAL)
  	{
  		if (dovacuum || doanalyze)
  			elog(DEBUG2, "autovac: will%s%s %s",
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.241
diff -c -r1.241 relcache.c
*** src/backend/utils/cache/relcache.c	6 May 2006 15:51:07 -0000	1.241
--- src/backend/utils/cache/relcache.c	11 Jun 2006 06:49:10 -0000
***************
*** 43,48 ****
--- 43,49 ----
  #include "catalog/pg_constraint.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_namespace.h"
+ #include "catalog/pg_ntclass.h"
  #include "catalog/pg_opclass.h"
  #include "catalog/pg_proc.h"
  #include "catalog/pg_rewrite.h"
***************
*** 71,77 ****
   */
  #define RELCACHE_INIT_FILENAME	"pg_internal.init"
  
! #define RELCACHE_INIT_FILEMAGIC		0x573262	/* version ID value */
  
  /*
   *		hardcoded tuple descriptors.  see include/catalog/pg_attribute.h
--- 72,78 ----
   */
  #define RELCACHE_INIT_FILENAME	"pg_internal.init"
  
! #define RELCACHE_INIT_FILEMAGIC		0x573263	/* version ID value */
  
  /*
   *		hardcoded tuple descriptors.  see include/catalog/pg_attribute.h
***************
*** 81,86 ****
--- 82,88 ----
  static FormData_pg_attribute Desc_pg_proc[Natts_pg_proc] = {Schema_pg_proc};
  static FormData_pg_attribute Desc_pg_type[Natts_pg_type] = {Schema_pg_type};
  static FormData_pg_attribute Desc_pg_index[Natts_pg_index] = {Schema_pg_index};
+ static FormData_pg_attribute Desc_pg_ntclass[Natts_pg_ntclass] = {Schema_pg_ntclass};
  
  /*
   *		Hash tables that index the relation cache
***************
*** 1163,1169 ****
   *		catalogs.
   *
   * formrdesc is currently used for: pg_class, pg_attribute, pg_proc,
!  * and pg_type (see RelationCacheInitializePhase2).
   *
   * Note that these catalogs can't have constraints (except attnotnull),
   * default values, rules, or triggers, since we don't cope with any of that.
--- 1165,1171 ----
   *		catalogs.
   *
   * formrdesc is currently used for: pg_class, pg_attribute, pg_proc,
!  * pg_ntclass and pg_type (see RelationCacheInitializePhase2).
   *
   * Note that these catalogs can't have constraints (except attnotnull),
   * default values, rules, or triggers, since we don't cope with any of that.
***************
*** 1220,1232 ****
  	 */
  	relation->rd_rel->relisshared = false;
  
- 	relation->rd_rel->relpages = 1;
- 	relation->rd_rel->reltuples = 1;
  	relation->rd_rel->relkind = RELKIND_RELATION;
  	relation->rd_rel->relhasoids = hasoids;
  	relation->rd_rel->relnatts = (int16) natts;
  
  	/*
  	 * initialize attribute tuple form
  	 *
  	 * Unlike the case with the relation tuple, this data had better be right
--- 1222,1238 ----
  	 */
  	relation->rd_rel->relisshared = false;
  
  	relation->rd_rel->relkind = RELKIND_RELATION;
  	relation->rd_rel->relhasoids = hasoids;
  	relation->rd_rel->relnatts = (int16) natts;
  
  	/*
+ 	 * We don't need relntrans at this point, so set it to invalid to make
+ 	 * sure we will correct it later before trying to access it.
+ 	 */
+ 	ItemPointerSetInvalid(&(relation->rd_rel->relntrans));
+ 
+ 	/*
  	 * initialize attribute tuple form
  	 *
  	 * Unlike the case with the relation tuple, this data had better be right
***************
*** 2028,2033 ****
--- 2034,2040 ----
  		case AttributeRelationId:
  		case ProcedureRelationId:
  		case TypeRelationId:
+ 		case NtClassRelationId:
  			nailit = true;
  			break;
  		default:
***************
*** 2120,2125 ****
--- 2127,2143 ----
  	rel->rd_rel->relfilenode = relid;
  	rel->rd_rel->reltablespace = reltablespace;
  
+ 	/* 
+ 	 * Initialize pg_ntclass tuple.  NOTE -- during bootstrap, we need to
+ 	 * insert a pseudo-valid entry for nailed relations.  Do the minimal work
+ 	 * needed to allow the system to continue operating in this case; it will
+ 	 * get fixed when we read the real pg_class entry.
+ 	 */
+ 	if (IsBootstrapProcessingMode() && nailit)
+ 		ItemPointerSet(&(rel->rd_rel->relntrans), 0, 1);
+ 	else
+ 		ItemPointerSetInvalid(&(rel->rd_rel->relntrans));
+ 
  	RelationInitLockInfo(rel);	/* see lmgr.c */
  
  	RelationInitPhysicalAddr(rel);
***************
*** 2230,2237 ****
  				  true, Natts_pg_proc, Desc_pg_proc);
  		formrdesc("pg_type", PG_TYPE_RELTYPE_OID,
  				  true, Natts_pg_type, Desc_pg_type);
  
! #define NUM_CRITICAL_RELS	4	/* fix if you change list above */
  	}
  
  	MemoryContextSwitchTo(oldcxt);
--- 2248,2257 ----
  				  true, Natts_pg_proc, Desc_pg_proc);
  		formrdesc("pg_type", PG_TYPE_RELTYPE_OID,
  				  true, Natts_pg_type, Desc_pg_type);
+ 		formrdesc("pg_ntclass", PG_NTCLASS_RELTYPE_OID,
+ 				  true, Natts_pg_ntclass, Desc_pg_ntclass);
  
! #define NUM_CRITICAL_RELS	5	/* fix if you change list above */
  	}
  
  	MemoryContextSwitchTo(oldcxt);
Index: src/include/catalog/catversion.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/catversion.h,v
retrieving revision 1.334
diff -c -r1.334 catversion.h
*** src/include/catalog/catversion.h	24 May 2006 11:01:39 -0000	1.334
--- src/include/catalog/catversion.h	10 Jun 2006 23:22:07 -0000
***************
*** 37,43 ****
   * Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
!  * $PostgreSQL: pgsql/src/include/catalog/catversion.h,v 1.334 2006/05/24 11:01:39 teodor Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 37,43 ----
   * Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
!  * $PostgreSQL: pgsql/src/include/catalog/catversion.h,v 1.334 2006-05-24 11:01:39 teodor Exp $
   *
   *-------------------------------------------------------------------------
   */
***************
*** 53,58 ****
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	200605241
  
  #endif
--- 53,58 ----
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	200606101
  
  #endif
Index: src/include/catalog/heap.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/heap.h,v
retrieving revision 1.80
diff -c -r1.80 heap.h
*** src/include/catalog/heap.h	30 Apr 2006 01:08:07 -0000	1.80
--- src/include/catalog/heap.h	5 Jun 2006 02:36:52 -0000
***************
*** 43,48 ****
--- 43,50 ----
  			bool shared_relation,
  			bool allow_system_table_mods);
  
+ extern ItemPointerData AddNewNtclassTuple(Relation new_rel_desc, char relkind);
+ 
  extern Oid heap_create_with_catalog(const char *relname,
  						 Oid relnamespace,
  						 Oid reltablespace,
***************
*** 86,91 ****
--- 88,94 ----
  				  DropBehavior behavior, bool complain);
  extern void RemoveAttrDefaultById(Oid attrdefId);
  extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+ extern void DeleteNtclassTuple(ItemPointer tid);
  
  extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
  						  bool relhasoids);
***************
*** 97,100 ****
--- 100,112 ----
  
  extern void CheckAttributeType(const char *attname, Oid atttypid);
  
+ /*
+  * use "struct HeapTupleData *" here instead of HeapTuple to avoid including
+  * htup.h
+  */
+ extern struct HeapTupleData *get_ntclass_entry(Relation pg_ntclass, Relation target);
+ extern void relation_get_ntups_npages(Relation relation, float4 *ntuples,
+ 									  int4 *npages);
+ extern void update_ntclass_entry(Relation pg_ntclass, struct HeapTupleData *tuple);
+ 
  #endif   /* HEAP_H */
Index: src/include/catalog/pg_attribute.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_attribute.h,v
retrieving revision 1.120
diff -c -r1.120 pg_attribute.h
*** src/include/catalog/pg_attribute.h	5 Mar 2006 15:58:54 -0000	1.120
--- src/include/catalog/pg_attribute.h	9 Jun 2006 22:47:58 -0000
***************
*** 387,410 ****
  { 1259, {"relam"},		   26, -1,	4,	5, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"relfilenode"},   26, -1,	4,	6, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"reltablespace"}, 26, -1,	4,	7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relpages"},	   23, -1,	4,	8, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltuples"},	   700, -1, 4,	9, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastrelid"}, 26, -1,	4, 10, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastidxid"}, 26, -1,	4, 11, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relhasindex"},   16, -1,	1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relisshared"},   16, -1,	1, 13, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relkind"},	   18, -1,	1, 14, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relnatts"},	   21, -1,	2, 15, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relchecks"},	   21, -1,	2, 16, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"reltriggers"},   21, -1,	2, 17, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relukeys"},	   21, -1,	2, 18, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relfkeys"},	   21, -1,	2, 19, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relrefs"},	   21, -1,	2, 20, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasoids"},    16, -1,	1, 21, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhaspkey"},    16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhasrules"},   16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhassubclass"},16, -1,	1, 24, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 25, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
--- 387,409 ----
  { 1259, {"relam"},		   26, -1,	4,	5, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"relfilenode"},   26, -1,	4,	6, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"reltablespace"}, 26, -1,	4,	7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastrelid"}, 26, -1,	4, 8, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastidxid"}, 26, -1,	4, 9, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relntrans"},	   27, -1,	6, 10, 0, -1, -1, false, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasindex"},   16, -1,	1, 11, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relisshared"},   16, -1,	1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relkind"},	   18, -1,	1, 13, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relnatts"},	   21, -1,	2, 14, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relchecks"},	   21, -1,	2, 15, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"reltriggers"},   21, -1,	2, 16, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relukeys"},	   21, -1,	2, 17, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relfkeys"},	   21, -1,	2, 18, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relrefs"},	   21, -1,	2, 19, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasoids"},    16, -1,	1, 20, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhaspkey"},    16, -1,	1, 21, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhasrules"},   16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhassubclass"},16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 24, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
***************
*** 413,436 ****
  DATA(insert ( 1259 relam			26 -1 4   5 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 relfilenode		26 -1 4   6 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 reltablespace	26 -1 4   7 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relpages			23 -1 4   8 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltuples	   700 -1 4   9 0 -1 -1 f p i t f f t 0));
! DATA(insert ( 1259 reltoastrelid	26 -1 4  10 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastidxid	26 -1 4  11 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relhasindex		16 -1 1  12 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relisshared		16 -1 1  13 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relkind			18 -1 1  14 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relnatts			21 -1 2  15 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relchecks		21 -1 2  16 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 reltriggers		21 -1 2  17 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relukeys			21 -1 2  18 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relfkeys			21 -1 2  19 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relrefs			21 -1 2  20 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relhasoids		16 -1 1  21 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhaspkey		16 -1 1  22 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhasrules		16 -1 1  23 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhassubclass	16 -1 1  24 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 25 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
--- 412,434 ----
  DATA(insert ( 1259 relam			26 -1 4   5 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 relfilenode		26 -1 4   6 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 reltablespace	26 -1 4   7 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastrelid	26 -1 4  8 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastidxid	26 -1 4  9 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relntrans		27 -1 6  10 0 -1 -1 f p s t f f t 0));
! DATA(insert ( 1259 relhasindex		16 -1 1  11 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relisshared		16 -1 1  12 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relkind			18 -1 1  13 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relnatts			21 -1 2  14 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relchecks		21 -1 2  15 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 reltriggers		21 -1 2  16 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relukeys			21 -1 2  17 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relfkeys			21 -1 2  18 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relrefs			21 -1 2  19 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relhasoids		16 -1 1  20 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhaspkey		16 -1 1  21 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhasrules		16 -1 1  22 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhassubclass	16 -1 1  23 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 24 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
***************
*** 440,445 ****
--- 438,461 ----
  DATA(insert ( 1259 tableoid			26 0  4  -7 0 -1 -1 t p i t f f t 0));
  
  /* ----------------
+  *		pg_ntclass
+  * ----------------
+  */
+ #define Schema_pg_ntclass \
+ { 1004, {"relpages"},	   23, -1,	4,	1, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
+ { 1004, {"reltuples"},	   700, -1, 4,	2, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }
+ 
+ DATA(insert ( 1004 relpages			23 -1 4   1 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 reltuples	   700 -1 4   2 0 -1 -1 f p i t f f t 0));
+ DATA(insert ( 1004 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
+ /* no OIDs in pg_ntclass */
+ DATA(insert ( 1004 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 cmin				29 0  4  -4 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 xmax				28 0  4  -5 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 cmax				29 0  4  -6 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 tableoid			26 0  4  -7 0 -1 -1 t p i t f f t 0));
+ 
+ /* ----------------
   *		pg_index
   *
   * pg_index is not bootstrapped in the same way as the other relations that
Index: src/include/catalog/pg_class.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_class.h,v
retrieving revision 1.92
diff -c -r1.92 pg_class.h
*** src/include/catalog/pg_class.h	28 May 2006 02:27:08 -0000	1.92
--- src/include/catalog/pg_class.h	10 Jun 2006 22:10:25 -0000
***************
*** 19,24 ****
--- 19,26 ----
  #ifndef PG_CLASS_H
  #define PG_CLASS_H
  
+ #include "storage/itemptr.h"
+ 
  /* ----------------
   *		postgres.h contains the system type definitions and the
   *		CATALOG(), BKI_BOOTSTRAP and DATA() sugar words so this file
***************
*** 32,37 ****
--- 34,41 ----
   * ----------------
   */
  
+ typedef ItemPointerData tid;
+ 
  /* ----------------
   *		This structure is actually variable-length (the last attribute is
   *		a POSTGRES array).	Hence, sizeof(FormData_pg_class) does not
***************
*** 51,60 ****
  	Oid			relam;			/* index access method; 0 if not an index */
  	Oid			relfilenode;	/* identifier of physical storage file */
  	Oid			reltablespace;	/* identifier of table space for relation */
- 	int4		relpages;		/* # of blocks (not always up-to-date) */
- 	float4		reltuples;		/* # of tuples (not always up-to-date) */
  	Oid			reltoastrelid;	/* OID of toast table; 0 if none */
  	Oid			reltoastidxid;	/* if toast table, OID of chunk_id index */
  	bool		relhasindex;	/* T if has (or has had) any indexes */
  	bool		relisshared;	/* T if shared across databases */
  	char		relkind;		/* see RELKIND_xxx constants below */
--- 55,63 ----
  	Oid			relam;			/* index access method; 0 if not an index */
  	Oid			relfilenode;	/* identifier of physical storage file */
  	Oid			reltablespace;	/* identifier of table space for relation */
  	Oid			reltoastrelid;	/* OID of toast table; 0 if none */
  	Oid			reltoastidxid;	/* if toast table, OID of chunk_id index */
+ 	tid			relntrans;		/* CTID of pg_ntclass entry */
  	bool		relhasindex;	/* T if has (or has had) any indexes */
  	bool		relisshared;	/* T if shared across databases */
  	char		relkind;		/* see RELKIND_xxx constants below */
***************
*** 103,110 ****
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			24
! #define Natts_pg_class					25
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
--- 106,113 ----
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			23
! #define Natts_pg_class					24
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
***************
*** 112,135 ****
  #define Anum_pg_class_relam				5
  #define Anum_pg_class_relfilenode		6
  #define Anum_pg_class_reltablespace		7
! #define Anum_pg_class_relpages			8
! #define Anum_pg_class_reltuples			9
! #define Anum_pg_class_reltoastrelid		10
! #define Anum_pg_class_reltoastidxid		11
! #define Anum_pg_class_relhasindex		12
! #define Anum_pg_class_relisshared		13
! #define Anum_pg_class_relkind			14
! #define Anum_pg_class_relnatts			15
! #define Anum_pg_class_relchecks			16
! #define Anum_pg_class_reltriggers		17
! #define Anum_pg_class_relukeys			18
! #define Anum_pg_class_relfkeys			19
! #define Anum_pg_class_relrefs			20
! #define Anum_pg_class_relhasoids		21
! #define Anum_pg_class_relhaspkey		22
! #define Anum_pg_class_relhasrules		23
! #define Anum_pg_class_relhassubclass	24
! #define Anum_pg_class_relacl			25
  
  /* ----------------
   *		initial contents of pg_class
--- 115,137 ----
  #define Anum_pg_class_relam				5
  #define Anum_pg_class_relfilenode		6
  #define Anum_pg_class_reltablespace		7
! #define Anum_pg_class_reltoastrelid		8
! #define Anum_pg_class_reltoastidxid		9
! #define Anum_pg_class_relhasindex		10
! #define Anum_pg_class_relisshared		11
! #define Anum_pg_class_relkind			12
! #define Anum_pg_class_relnatts			13
! #define Anum_pg_class_relntrans			14
! #define Anum_pg_class_relchecks			15
! #define Anum_pg_class_reltriggers		16
! #define Anum_pg_class_relukeys			17
! #define Anum_pg_class_relfkeys			18
! #define Anum_pg_class_relrefs			19
! #define Anum_pg_class_relhasoids		20
! #define Anum_pg_class_relhaspkey		21
! #define Anum_pg_class_relhasrules		22
! #define Anum_pg_class_relhassubclass	23
! #define Anum_pg_class_relacl			24
  
  /* ----------------
   *		initial contents of pg_class
***************
*** 139,151 ****
   * ----------------
   */
  
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 0 0 f f r 23 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 18 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 25 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
--- 141,155 ----
   * ----------------
   */
  
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 "(0,1)" f f r 23 0 0 0 0 0 t f f f _null_ ));
! DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 "(0,2)" f f r 17 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 "(0,3)" f f r 18 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 "(0,4)" f f r 24 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1004 (  pg_ntclass	PGNSP 86 PGUID 0 1004 0 0 0 "(0,5)" f f n  2 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
***************
*** 155,159 ****
--- 159,164 ----
  #define		  RELKIND_TOASTVALUE	  't'		/* moved off huge values */
  #define		  RELKIND_VIEW			  'v'		/* view */
  #define		  RELKIND_COMPOSITE_TYPE  'c'		/* composite type */
+ #define		  RELKIND_NON_TRANSACTIONAL 'n'		/* non-transactional heap */
  
  #endif   /* PG_CLASS_H */
Index: src/include/catalog/pg_type.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_type.h,v
retrieving revision 1.171
diff -c -r1.171 pg_type.h
*** src/include/catalog/pg_type.h	5 Apr 2006 22:11:57 -0000	1.171
--- src/include/catalog/pg_type.h	5 Jun 2006 02:36:52 -0000
***************
*** 314,319 ****
--- 314,321 ----
  #define PG_PROC_RELTYPE_OID 81
  DATA(insert OID = 83 (	pg_class		PGNSP PGUID -1 f c t \054 1259 0 record_in record_out record_recv record_send - d x f 0 -1 0 _null_ _null_ ));
  #define PG_CLASS_RELTYPE_OID 83
+ DATA(insert OID = 86 (	pg_ntclass		PGNSP PGUID -1 f c t \054 1004 0 record_in record_out record_recv record_send - d x f 0 -1 0 _null_ _null_ ));
+ #define PG_NTCLASS_RELTYPE_OID 86
  
  /* OIDS 100 - 199 */
  
Index: src/include/commands/vacuum.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/commands/vacuum.h,v
retrieving revision 1.63
diff -c -r1.63 vacuum.h
*** src/include/commands/vacuum.h	5 Mar 2006 15:58:55 -0000	1.63
--- src/include/commands/vacuum.h	5 Jun 2006 02:36:52 -0000
***************
*** 114,120 ****
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Oid relid,
  					BlockNumber num_pages,
  					double num_tuples,
  					bool hasindex);
--- 114,120 ----
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Relation rel,
  					BlockNumber num_pages,
  					double num_tuples,
  					bool hasindex);
Index: src/test/regress/expected/stats.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/stats.out,v
retrieving revision 1.7
diff -c -r1.7 stats.out
*** src/test/regress/expected/stats.out	11 Jan 2006 20:12:42 -0000	1.7
--- src/test/regress/expected/stats.out	5 Jun 2006 02:36:52 -0000
***************
*** 2,8 ****
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by create_misc) and indexed (by create_index).
  --
  -- conditio sine qua non
  SHOW stats_start_collector;  -- must be on
--- 2,8 ----
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by copy and create_misc) and indexed (by create_index).
  --
  -- conditio sine qua non
  SHOW stats_start_collector;  -- must be on
***************
*** 44,62 ****
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
   t        | t        | t        | t
  (1 row)
  
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? 
  ----------+----------
--- 44,66 ----
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + nt.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st,
!        pg_class AS cl JOIN pg_ntclass AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
   t        | t        | t        | t
  (1 row)
  
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + nt.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st,
!        pg_class AS cl JOIN pg_ntclass AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? 
  ----------+----------
Index: src/test/regress/expected/type_sanity.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/type_sanity.out,v
retrieving revision 1.27
diff -c -r1.27 type_sanity.out
*** src/test/regress/expected/type_sanity.out	10 Jul 2005 21:14:00 -0000	1.27
--- src/test/regress/expected/type_sanity.out	5 Jun 2006 02:36:52 -0000
***************
*** 216,222 ****
  -- Look for illegal values in pg_class fields
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v');
   oid | relname 
  -----+---------
  (0 rows)
--- 216,222 ----
  -- Look for illegal values in pg_class fields
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v', 'n');
   oid | relname 
  -----+---------
  (0 rows)
Index: src/test/regress/expected/without_oid.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/without_oid.out,v
retrieving revision 1.7
diff -c -r1.7 without_oid.out
*** src/test/regress/expected/without_oid.out	14 Mar 2006 22:48:25 -0000	1.7
--- src/test/regress/expected/without_oid.out	5 Jun 2006 02:36:52 -0000
***************
*** 48,54 ****
  VACUUM ANALYZE wi;
  VACUUM ANALYZE wo;
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class
   WHERE relname IN ('wi', 'wo');
   ?column? | ?column? 
  ----------+----------
--- 48,54 ----
  VACUUM ANALYZE wi;
  VACUUM ANALYZE wo;
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class JOIN pg_ntclass ON (pg_ntclass.ctid = relntrans)
   WHERE relname IN ('wi', 'wo');
   ?column? | ?column? 
  ----------+----------
Index: src/test/regress/sql/stats.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/stats.sql,v
retrieving revision 1.5
diff -c -r1.5 stats.sql
*** src/test/regress/sql/stats.sql	11 Jan 2006 20:12:43 -0000	1.5
--- src/test/regress/sql/stats.sql	5 Jun 2006 02:36:52 -0000
***************
*** 2,8 ****
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by create_misc) and indexed (by create_index).
  --
  
  -- conditio sine qua non
--- 2,8 ----
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by copy and create_misc) and indexed (by create_index).
  --
  
  -- conditio sine qua non
***************
*** 30,43 ****
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
  
  -- End of Stats Test
--- 30,47 ----
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + nt.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st,
!        pg_class AS cl JOIN pg_ntclass AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + nt.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st,
!        pg_class AS cl JOIN pg_ntclass AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
  
  -- End of Stats Test
Index: src/test/regress/sql/type_sanity.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/type_sanity.sql,v
retrieving revision 1.27
diff -c -r1.27 type_sanity.sql
*** src/test/regress/sql/type_sanity.sql	10 Jul 2005 21:14:00 -0000	1.27
--- src/test/regress/sql/type_sanity.sql	5 Jun 2006 02:36:52 -0000
***************
*** 169,175 ****
  
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v');
  
  -- Indexes should have an access method, others not.
  
--- 169,175 ----
  
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v', 'n');
  
  -- Indexes should have an access method, others not.
  
Index: src/test/regress/sql/without_oid.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/without_oid.sql,v
retrieving revision 1.6
diff -c -r1.6 without_oid.sql
*** src/test/regress/sql/without_oid.sql	19 Feb 2006 00:04:28 -0000	1.6
--- src/test/regress/sql/without_oid.sql	5 Jun 2006 02:36:52 -0000
***************
*** 45,51 ****
  VACUUM ANALYZE wo;
  
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class
   WHERE relname IN ('wi', 'wo');
  
  DROP TABLE wi;
--- 45,51 ----
  VACUUM ANALYZE wo;
  
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class JOIN pg_ntclass ON (pg_ntclass.ctid = relntrans)
   WHERE relname IN ('wi', 'wo');
  
  DROP TABLE wi;

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Alvaro Herrera (#1)

1 attachment(s)

Re: Non-transactional pg_class, try 2

Alvaro Herrera wrote:

Here I repost the patch to implement non-transactional catalogs, the
first of which is pg_ntclass, intended to hold the non-transactional
info about pg_class (reltuples, relpages).

I forgot to attach the new file pg_ntclass.h (src/include/catalog).
Here it is.

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Alvaro Herrera (#1)

1 attachment(s)

The corresponding relminxid patch; try 1

Hi,

This is the relminxid patch corresponding to the pg_ntclass patch I just
posted. Obviously, the relminxid and relvacuumxid fields are in
pg_ntclass (not pg_class like in the previous incarnations of this
patch). This makes the whole business much saner and now I don't need
to insert bogus CommandCounterIncrement calls. Regression tests pass.

The thing that bothers me most about this is that it turns LockRelation
into an operation that needs to heap_fetch() from pg_ntclass in some
cases, and possibly update it. I think we should consider some sort of
"non-transactional shared cache" for storing RELKIND_NON_TRANSACTIONAL
catalog entries. Eventually it may help the sequences stuff as well, if
we implement sequences using that kind of catalog.

The documentation changes may be a bit off in this patch, since I didn't
worry about merging it with the pg_ntclass patch. But it's easy to
correct and I'll do it before committing it.

My intention is to wait two or three days after committing the
pg_ntclass patch, and then commit this one, unless I hear objections
before that.

Attachments:

relminxid-ntclass-1.patchtext/plain; charset=iso-8859-1Download

diff -rc -X diff-ignore 11fixclass/doc/src/sgml/catalogs.sgml 12ntrelminxid/doc/src/sgml/catalogs.sgml
*** 11fixclass/doc/src/sgml/catalogs.sgml	2006-06-09 18:52:50.000000000 -0400
--- 12ntrelminxid/doc/src/sgml/catalogs.sgml	2006-06-10 19:38:04.000000000 -0400
***************
*** 1619,1624 ****
--- 1619,1648 ----
       </row>
  
       <row>
+       <entry><structfield>relminxid</structfield></entry>
+       <entry><type>xid</type></entry>
+       <entry></entry>
+       <entry>
+        The minimum transaction ID present in all rows in this table.  This
+        value is used to determine the database-global
+        <structname>pg_database</>.<structfield>datminxid</> value.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>relvacuumxid</structfield></entry>
+       <entry><type>xid</type></entry>
+       <entry></entry>
+       <entry>
+        The transaction ID that was used as cleaning point as of the last vacuum
+        operation.  All rows inserted, updated or deleted in this table by
+        transactions whose IDs are below this one have been marked as known good
+        or deleted.  This is used to determine the database-global
+        <structname>pg_database</>.<structfield>datvacuumxid</> value.
+       </entry>
+      </row>
+ 
+      <row>
        <entry><structfield>relacl</structfield></entry>
        <entry><type>aclitem[]</type></entry>
        <entry></entry>
***************
*** 2048,2068 ****
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
!        All rows inserted or deleted by transaction IDs before this one
!        have been marked as known committed or known aborted in this database.
!        This is used to determine when commit-log space can be recycled.
        </entry>
       </row>
  
       <row>
!       <entry><structfield>datfrozenxid</structfield></entry>
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
         All rows inserted by transaction IDs before this one have been
         relabeled with a permanent (<quote>frozen</>) transaction ID in this
!        database.  This is useful to check whether a database must be vacuumed
!        soon to avoid transaction ID wrap-around problems.
        </entry>
       </row>
  
--- 2072,2098 ----
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
!        The transaction ID that was used as cleaning point as of the last vacuum
!        operation.  All rows inserted or deleted by transaction IDs before this one
!        have been marked as known good or deleted.  This
!        is used to determine when commit-log space can be recycled.
!        If InvalidTransactionId, then the minimum is unknown and can be
!        determined by scanning <structname>pg_class</>.<structfield>relvacuumxid</>.
        </entry>
       </row>
  
       <row>
!       <entry><structfield>datminxid</structfield></entry>
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
+        The minimum transaction ID present in all tables in this database.
         All rows inserted by transaction IDs before this one have been
         relabeled with a permanent (<quote>frozen</>) transaction ID in this
!        database.  This is useful to check whether a database must be
!        vacuumed soon to avoid transaction ID wrap-around problems.
!        If InvalidTransactionId, then the minimum is unknown and can be
!        determined by scanning <structname>pg_class</>.<structfield>relminxid</>.
        </entry>
       </row>
  
diff -rc -X diff-ignore 11fixclass/src/backend/access/heap/heapam.c 12ntrelminxid/src/backend/access/heap/heapam.c
*** 11fixclass/src/backend/access/heap/heapam.c	2006-06-10 19:17:47.000000000 -0400
--- 12ntrelminxid/src/backend/access/heap/heapam.c	2006-06-11 02:41:15.000000000 -0400
***************
*** 45,61 ****
--- 45,74 ----
  #include "access/valid.h"
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
+ #include "catalog/heap.h"
  #include "catalog/namespace.h"
+ #include "catalog/pg_ntclass.h"
  #include "miscadmin.h"
  #include "pgstat.h"
  #include "storage/procarray.h"
  #include "utils/inval.h"
  #include "utils/relcache.h"
+ #include "utils/syscache.h"
  
  
  static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
  		   ItemPointerData from, Buffer newbuf, HeapTuple newtup, bool move);
  
+ /*
+  * There are some situations in which we want to acquire strong locks on
+  * relations, but we know there is no need to unfreeze them; for example,
+  * during VACUUM.  Any such caller sets this variable to true, which turns
+  * heap_unfreeze into a no-op.
+  *
+  * It's advisable to do this in a PG_TRY block so that it won't "forget" to
+  * reset the variable in case of error.
+  */
+ bool	disable_heap_unfreeze = false;
  
  /* ----------------------------------------------------------------
   *						 heap support routines
***************
*** 2669,2674 ****
--- 2682,2772 ----
  	return HeapTupleMayBeUpdated;
  }
  
+ /*
+  * heap_unfreeze
+  * 		Mark a table as no longer frozen in pg_ntclass.
+  *
+  * This routine updates the pg_ntclass.relminxid and relvacuumxid columns so
+  * that they no longer appear as frozen.
+  */
+ void
+ heap_unfreeze(Relation rel)
+ {
+ 	Form_pg_ntclass	ntclassForm;
+ 	HeapTuple		tuple;
+ 	bool			dirty = false;
+ 
+ 	/* early exit if somebody decided to disable us */
+ 	if (disable_heap_unfreeze)
+ 		return;
+ 
+ 	/* Exit early in bootstrap mode */
+ 	if (IsBootstrapProcessingMode())
+ 		return;
+ 
+ 	/*
+ 	 * We process only normal heaps, toast tables, and non-transactional
+ 	 * heaps.  Other kinds of heaps don't store Xids in them and thus they
+ 	 * don't need unfreezing.
+ 	 */
+ 	if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ 		rel->rd_rel->relkind != RELKIND_INDEX &&
+ 		rel->rd_rel->relkind != RELKIND_NON_TRANSACTIONAL)
+ 		return;
+ 
+ 	/*
+ 	 * Under normal conditions, we should have a snapshot already, but in some
+ 	 * cases there may not be one.  Getting a snapshot guarantees we will have
+ 	 * a valid RecentXmin to use.
+ 	 */
+ 	if (!TransactionIdIsValid(RecentXmin))
+ 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+ 	Assert(TransactionIdIsValid(RecentXmin));
+ 
+ 	PG_TRY();
+ 	{
+ 		Relation	pg_ntclass;
+ 
+ 		/* Disable ourselves so that we don't recurse on unfreezing pg_ntclass */
+ 		disable_heap_unfreeze = true;
+ 
+ 		pg_ntclass = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 		/* Fetch a copy to scribble on */
+ 		tuple = get_ntclass_entry(pg_ntclass, rel);
+ 
+ 		ntclassForm = (Form_pg_ntclass) GETSTRUCT(tuple);
+ 
+ 		/* Apply all necessary updates. */
+ 		if (TransactionIdEquals(ntclassForm->relminxid, FrozenTransactionId))
+ 		{
+ 			ntclassForm->relminxid = RecentXmin;
+ 			dirty = true;
+ 		}
+ 		if (TransactionIdEquals(ntclassForm->relvacuumxid, FrozenTransactionId))
+ 		{
+ 			ntclassForm->relvacuumxid = RecentXmin;
+ 			dirty = true;
+ 		}
+ 		/* Update tuple if necessary. */
+ 		if (dirty)
+ 			update_ntclass_entry(pg_ntclass, tuple);
+ 
+ 		/* All done, cleanup and return */
+ 		heap_freetuple(tuple);
+ 		heap_close(pg_ntclass, RowExclusiveLock);
+ 
+ 		/* reenable heap unfreezing */
+ 		disable_heap_unfreeze = false;
+ 	}
+ 	PG_CATCH();
+ 	{
+ 		/* make sure to reenable heap unfreezing in case of error */
+ 		disable_heap_unfreeze = false;
+ 		PG_RE_THROW();
+ 	}
+ 	PG_END_TRY();
+ }
  
  /*
   * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
diff -rc -X diff-ignore 11fixclass/src/backend/access/transam/varsup.c 12ntrelminxid/src/backend/access/transam/varsup.c
*** 11fixclass/src/backend/access/transam/varsup.c	2006-03-08 16:23:24.000000000 -0300
--- 12ntrelminxid/src/backend/access/transam/varsup.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 168,178 ****
  
  /*
   * Determine the last safe XID to allocate given the currently oldest
!  * datfrozenxid (ie, the oldest XID that might exist in any database
   * of our cluster).
   */
  void
! SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
  					  Name oldest_datname)
  {
  	TransactionId xidWarnLimit;
--- 168,178 ----
  
  /*
   * Determine the last safe XID to allocate given the currently oldest
!  * datminxid (ie, the oldest XID that might exist in any database
   * of our cluster).
   */
  void
! SetTransactionIdLimit(TransactionId oldest_datminxid,
  					  Name oldest_datname)
  {
  	TransactionId xidWarnLimit;
***************
*** 180,195 ****
  	TransactionId xidWrapLimit;
  	TransactionId curXid;
  
! 	Assert(TransactionIdIsValid(oldest_datfrozenxid));
  
  	/*
  	 * The place where we actually get into deep trouble is halfway around
! 	 * from the oldest potentially-existing XID.  (This calculation is
! 	 * probably off by one or two counts, because the special XIDs reduce the
! 	 * size of the loop a little bit.  But we throw in plenty of slop below,
! 	 * so it doesn't matter.)
  	 */
! 	xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
  	if (xidWrapLimit < FirstNormalTransactionId)
  		xidWrapLimit += FirstNormalTransactionId;
  
--- 180,195 ----
  	TransactionId xidWrapLimit;
  	TransactionId curXid;
  
! 	Assert(TransactionIdIsValid(oldest_datminxid));
  
  	/*
  	 * The place where we actually get into deep trouble is halfway around
! 	 * from the oldest existing XID.  (This calculation is probably off by one
! 	 * or two counts, because the special XIDs reduce the size of the loop a
! 	 * little bit.  But we throw in plenty of slop below, so it doesn't
! 	 * matter.)
  	 */
! 	xidWrapLimit = oldest_datminxid + (MaxTransactionId >> 1);
  	if (xidWrapLimit < FirstNormalTransactionId)
  		xidWrapLimit += FirstNormalTransactionId;
  
diff -rc -X diff-ignore 11fixclass/src/backend/catalog/heap.c 12ntrelminxid/src/backend/catalog/heap.c
*** 11fixclass/src/backend/catalog/heap.c	2006-06-11 02:46:03.000000000 -0400
--- 12ntrelminxid/src/backend/catalog/heap.c	2006-06-11 01:03:39.000000000 -0400
***************
*** 637,648 ****
  		switch (relkind)
  		{
  			case RELKIND_RELATION:
- 			case RELKIND_INDEX:
  			case RELKIND_TOASTVALUE:
  			case RELKIND_NON_TRANSACTIONAL:
  				/* The relation is real, but as yet empty */
  				new_rel_nttup->relpages = 0;
  				new_rel_nttup->reltuples = 0;
  				break;
  			default:
  				elog(ERROR, "can't create pg_ntclass entries for relkind '%c'",
--- 637,665 ----
  		switch (relkind)
  		{
  			case RELKIND_RELATION:
  			case RELKIND_TOASTVALUE:
  			case RELKIND_NON_TRANSACTIONAL:
  				/* The relation is real, but as yet empty */
  				new_rel_nttup->relpages = 0;
  				new_rel_nttup->reltuples = 0;
+ 				/*
+ 				 * Real tables have Xids stored in them, so initialize our
+ 				 * known value to the minimum Xid that could put tuples in the
+ 				 * new table.
+ 				 */
+ 				new_rel_nttup->relminxid = RecentXmin;
+ 				new_rel_nttup->relvacuumxid = RecentXmin;
+ 				break;
+ 			case RELKIND_INDEX:
+ 				/* The relation is real, but as yet empty */
+ 				new_rel_nttup->relpages = 0;
+ 				new_rel_nttup->reltuples = 0;
+ 				/*
+ 				 * Index relations don't have Xids stored in them, so set the
+ 				 * transaction Id values to InvalidTransactionId.
+ 				 */
+ 				new_rel_nttup->relminxid = InvalidTransactionId;
+ 				new_rel_nttup->relvacuumxid = InvalidTransactionId;
  				break;
  			default:
  				elog(ERROR, "can't create pg_ntclass entries for relkind '%c'",
diff -rc -X diff-ignore 11fixclass/src/backend/commands/analyze.c 12ntrelminxid/src/backend/commands/analyze.c
*** 11fixclass/src/backend/commands/analyze.c	2006-06-10 19:17:47.000000000 -0400
--- 12ntrelminxid/src/backend/commands/analyze.c	2006-06-11 01:26:00.000000000 -0400
***************
*** 420,426 ****
  
  	/*
  	 * If we are running a standalone ANALYZE, update pages/tuples stats in
! 	 * pg_class.  We know the accurate page count from the smgr, but only an
  	 * approximate number of tuples; therefore, if we are part of VACUUM
  	 * ANALYZE do *not* overwrite the accurate count already inserted by
  	 * VACUUM.	The same consideration applies to indexes.
--- 420,426 ----
  
  	/*
  	 * If we are running a standalone ANALYZE, update pages/tuples stats in
! 	 * pg_ntclass.  We know the accurate page count from the smgr, but only an
  	 * approximate number of tuples; therefore, if we are part of VACUUM
  	 * ANALYZE do *not* overwrite the accurate count already inserted by
  	 * VACUUM.	The same consideration applies to indexes.
***************
*** 429,436 ****
  	{
  		vac_update_relstats(onerel,
  							RelationGetNumberOfBlocks(onerel),
! 							totalrows,
! 							hasindex);
  		for (ind = 0; ind < nindexes; ind++)
  		{
  			AnlIndexData *thisdata = &indexdata[ind];
--- 429,437 ----
  	{
  		vac_update_relstats(onerel,
  							RelationGetNumberOfBlocks(onerel),
! 							totalrows, hasindex,
! 							InvalidTransactionId, InvalidTransactionId);
! 
  		for (ind = 0; ind < nindexes; ind++)
  		{
  			AnlIndexData *thisdata = &indexdata[ind];
***************
*** 439,446 ****
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
  			vac_update_relstats(Irel[ind],
  								RelationGetNumberOfBlocks(Irel[ind]),
! 								totalindexrows,
! 								false);
  		}
  
  		/* report results to the stats collector, too */
--- 440,447 ----
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
  			vac_update_relstats(Irel[ind],
  								RelationGetNumberOfBlocks(Irel[ind]),
! 								totalindexrows, false,
! 								InvalidTransactionId, InvalidTransactionId);
  		}
  
  		/* report results to the stats collector, too */
diff -rc -X diff-ignore 11fixclass/src/backend/commands/dbcommands.c 12ntrelminxid/src/backend/commands/dbcommands.c
*** 11fixclass/src/backend/commands/dbcommands.c	2006-05-04 21:36:10.000000000 -0400
--- 12ntrelminxid/src/backend/commands/dbcommands.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 46,51 ****
--- 46,52 ----
  #include "utils/flatfiles.h"
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
+ #include "utils/inval.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  
***************
*** 55,61 ****
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbFrozenXidP,
  			Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
--- 56,62 ----
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbMinXidP,
  			Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
***************
*** 76,82 ****
  	bool		src_allowconn;
  	Oid			src_lastsysoid;
  	TransactionId src_vacuumxid;
! 	TransactionId src_frozenxid;
  	Oid			src_deftablespace;
  	volatile Oid dst_deftablespace;
  	Relation	pg_database_rel;
--- 77,83 ----
  	bool		src_allowconn;
  	Oid			src_lastsysoid;
  	TransactionId src_vacuumxid;
! 	TransactionId src_minxid;
  	Oid			src_deftablespace;
  	volatile Oid dst_deftablespace;
  	Relation	pg_database_rel;
***************
*** 228,234 ****
  	if (!get_db_info(dbtemplate, ShareLock,
  					 &src_dboid, &src_owner, &src_encoding,
  					 &src_istemplate, &src_allowconn, &src_lastsysoid,
! 					 &src_vacuumxid, &src_frozenxid, &src_deftablespace))
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_DATABASE),
  				 errmsg("template database \"%s\" does not exist",
--- 229,235 ----
  	if (!get_db_info(dbtemplate, ShareLock,
  					 &src_dboid, &src_owner, &src_encoding,
  					 &src_istemplate, &src_allowconn, &src_lastsysoid,
! 					 &src_vacuumxid, &src_minxid, &src_deftablespace))
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_DATABASE),
  				 errmsg("template database \"%s\" does not exist",
***************
*** 327,340 ****
  	}
  
  	/*
! 	 * Normally we mark the new database with the same datvacuumxid and
! 	 * datfrozenxid as the source.	However, if the source is not allowing
! 	 * connections then we assume it is fully frozen, and we can set the
! 	 * current transaction ID as the xid limits.  This avoids immediately
! 	 * starting to generate warnings after cloning template0.
! 	 */
! 	if (!src_allowconn)
! 		src_vacuumxid = src_frozenxid = GetCurrentTransactionId();
  
  	/*
  	 * Check for db name conflict.	This is just to give a more friendly
--- 328,342 ----
  	}
  
  	/*
! 	 * Normally we mark the new database with the same datminxid as the source.
! 	 * However, if the source is fully frozen, we must not mark the new
! 	 * database as frozen because of the new pg_database tuple, which will be
! 	 * marked with our transaction ID.
! 	 */
! 	if (TransactionIdEquals(src_minxid, FrozenTransactionId))
! 		src_minxid = GetCurrentTransactionId();
! 	if (TransactionIdEquals(src_vacuumxid, FrozenTransactionId))
! 		src_vacuumxid = GetCurrentTransactionId();
  
  	/*
  	 * Check for db name conflict.	This is just to give a more friendly
***************
*** 367,373 ****
  	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
  	new_record[Anum_pg_database_datlastsysoid - 1] = ObjectIdGetDatum(src_lastsysoid);
  	new_record[Anum_pg_database_datvacuumxid - 1] = TransactionIdGetDatum(src_vacuumxid);
! 	new_record[Anum_pg_database_datfrozenxid - 1] = TransactionIdGetDatum(src_frozenxid);
  	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
  
  	/*
--- 369,375 ----
  	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
  	new_record[Anum_pg_database_datlastsysoid - 1] = ObjectIdGetDatum(src_lastsysoid);
  	new_record[Anum_pg_database_datvacuumxid - 1] = TransactionIdGetDatum(src_vacuumxid);
! 	new_record[Anum_pg_database_datminxid - 1] = TransactionIdGetDatum(src_minxid);
  	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
  
  	/*
***************
*** 1050,1055 ****
--- 1052,1119 ----
  	 */
  }
  
+ /*
+  * UnfreezeDatabase
+  *		Unfreezes a database
+  *
+  * Unfreezing a database is to set datminxid and datvacuumxid to a current
+  * normal Xid.  Currently, we do this anytime somebody connects to a database
+  * that is currently marked as frozen (datminxid = FrozenTransactionId).
+  *
+  * NB --- this is called early during backend initialization.
+  */
+ void
+ UnfreezeDatabase(Oid dbid, TransactionId unfreezeXid)
+ {
+ 	Relation	dbRel;
+ 	HeapTuple	tuple;
+ 	Form_pg_database dbForm;
+ 	bool		dirty = false;
+ 
+ 	PG_TRY();
+ 	{
+ 		disable_heap_unfreeze = true;
+ 
+ 		dbRel = heap_open(DatabaseRelationId, RowExclusiveLock);
+ 
+ 		/* fetch a copy of the tuple to scribble on */
+ 		tuple = SearchSysCacheCopy(DATABASEOID,
+ 								   dbid,
+ 								   0, 0, 0);
+ 
+ 		if (!HeapTupleIsValid(tuple))
+ 			elog(ERROR,
+ 				 "cache lookup failed for database %u", dbid);
+ 
+ 		dbForm = (Form_pg_database) GETSTRUCT(tuple);
+ 
+ 		/* make sure no one did it while we weren't looking */
+ 		if (TransactionIdEquals(dbForm->datminxid, FrozenTransactionId))
+ 		{
+ 			dbForm->datminxid = unfreezeXid;
+ 			dirty = true;
+ 		}
+ 		if (TransactionIdEquals(dbForm->datvacuumxid, FrozenTransactionId))
+ 		{
+ 			dbForm->datvacuumxid = unfreezeXid;
+ 			dirty = true;
+ 		}
+ 
+ 		if (dirty)
+ 			heap_inplace_update(dbRel, tuple);
+ 
+ 		heap_close(dbRel, RowExclusiveLock);
+ 	}
+ 	PG_CATCH();
+ 	{
+ 		disable_heap_unfreeze = false;
+ 		PG_RE_THROW();
+ 	};
+ 	PG_END_TRY();
+ 
+ 	/* reenable heap unfreezing */
+ 	disable_heap_unfreeze = false;
+ }
  
  /*
   * Helper functions
***************
*** 1066,1072 ****
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbFrozenXidP,
  			Oid *dbTablespace)
  {
  	bool		result = false;
--- 1130,1136 ----
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbMinXidP,
  			Oid *dbTablespace)
  {
  	bool		result = false;
***************
*** 1155,1163 ****
  				/* limit of vacuumed XIDs */
  				if (dbVacuumXidP)
  					*dbVacuumXidP = dbform->datvacuumxid;
! 				/* limit of frozen XIDs */
! 				if (dbFrozenXidP)
! 					*dbFrozenXidP = dbform->datfrozenxid;
  				/* default tablespace for this database */
  				if (dbTablespace)
  					*dbTablespace = dbform->dattablespace;
--- 1219,1227 ----
  				/* limit of vacuumed XIDs */
  				if (dbVacuumXidP)
  					*dbVacuumXidP = dbform->datvacuumxid;
! 				/* limit of min XIDs */
! 				if (dbMinXidP)
! 					*dbMinXidP = dbform->datminxid;
  				/* default tablespace for this database */
  				if (dbTablespace)
  					*dbTablespace = dbform->dattablespace;
diff -rc -X diff-ignore 11fixclass/src/backend/commands/vacuum.c 12ntrelminxid/src/backend/commands/vacuum.c
*** 11fixclass/src/backend/commands/vacuum.c	2006-06-10 19:17:47.000000000 -0400
--- 12ntrelminxid/src/backend/commands/vacuum.c	2006-06-11 02:37:38.000000000 -0400
***************
*** 129,134 ****
--- 129,135 ----
  	Size		min_tlen;
  	Size		max_tlen;
  	bool		hasindex;
+ 	TransactionId minxid;	/* Minimum Xid present anywhere on table */
  	/* vtlinks array for tuple chain following - sorted by new_tid */
  	int			num_vtlinks;
  	VTupleLink	vtlinks;
***************
*** 196,220 ****
  
  static int	elevel = -1;
  
- static TransactionId OldestXmin;
- static TransactionId FreezeLimit;
- 
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
  			 const char *stmttype, bool full);
! static void vac_update_dbstats(Oid dbid,
! 				   TransactionId vacuumXID,
! 				   TransactionId frozenXID);
! static void vac_truncate_clog(TransactionId vacuumXID,
! 				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages);
  static void repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel);
  static void move_chain_tuple(Relation rel,
  				 Buffer old_buf, Page old_page, HeapTuple old_tup,
  				 Buffer dst_buf, Page dst_page, VacPage dst_vacpage,
--- 197,216 ----
  
  static int	elevel = -1;
  
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
  			 const char *stmttype, bool full);
! static void vac_update_dbminxid(Oid dbid);
! static void vac_truncate_clog(void);
! static void vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages,
! 		  TransactionId FreezeLimit, TransactionId OldestXmin);
  static void repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel, TransactionId OldestXmin);
  static void move_chain_tuple(Relation rel,
  				 Buffer old_buf, Page old_page, HeapTuple old_tup,
  				 Buffer dst_buf, Page dst_page, VacPage dst_vacpage,
***************
*** 270,277 ****
  vacuum(VacuumStmt *vacstmt, List *relids)
  {
  	const char *stmttype = vacstmt->vacuum ? "VACUUM" : "ANALYZE";
- 	TransactionId initialOldestXmin = InvalidTransactionId;
- 	TransactionId initialFreezeLimit = InvalidTransactionId;
  	volatile MemoryContext anl_context = NULL;
  	volatile bool all_rels,
  				in_outer_xact,
--- 266,271 ----
***************
*** 355,386 ****
  	relations = get_rel_oids(relids, vacstmt->relation, stmttype,
  		   					 vacstmt->full);
  
- 	if (vacstmt->vacuum && all_rels)
- 	{
- 		/*
- 		 * It's a database-wide VACUUM.
- 		 *
- 		 * Compute the initially applicable OldestXmin and FreezeLimit XIDs,
- 		 * so that we can record these values at the end of the VACUUM. Note
- 		 * that individual tables may well be processed with newer values, but
- 		 * we can guarantee that no (non-shared) relations are processed with
- 		 * older ones.
- 		 *
- 		 * It is okay to record non-shared values in pg_database, even though
- 		 * we may vacuum shared relations with older cutoffs, because only the
- 		 * minimum of the values present in pg_database matters.  We can be
- 		 * sure that shared relations have at some time been vacuumed with
- 		 * cutoffs no worse than the global minimum; for, if there is a
- 		 * backend in some other DB with xmin = OLDXMIN that's determining the
- 		 * cutoff with which we vacuum shared relations, it is not possible
- 		 * for that database to have a cutoff newer than OLDXMIN recorded in
- 		 * pg_database.
- 		 */
- 		vacuum_set_xid_limits(vacstmt, false,
- 							  &initialOldestXmin,
- 							  &initialFreezeLimit);
- 	}
- 
  	/*
  	 * Decide whether we need to start/commit our own transactions.
  	 *
--- 349,354 ----
***************
*** 441,446 ****
--- 409,424 ----
  		VacuumCostBalance = 0;
  
  		/*
+ 		 * During vacuum, we are going to lock the relation with a writer's
+ 		 * lock, but if it's already frozen, we won't do any writing on it, so
+ 		 * disable the unfreezing for the duration of this VACUUM operation.
+ 		 * However, if this a FULL vacuum, we may mark some tuples with our
+ 		 * Xid even if the table is frozen, so skip this step.
+ 		 */
+ 		if (!vacstmt->full)
+ 	   		disable_heap_unfreeze = true;
+ 
+ 		/*
  		 * Loop to process each selected relation.
  		 */
  		foreach(cur, relations)
***************
*** 448,457 ****
  			Oid			relid = lfirst_oid(cur);
  
  			if (vacstmt->vacuum)
! 			{
! 				if (!vacuum_rel(relid, vacstmt, false))
! 					all_rels = false;	/* forget about updating dbstats */
! 			}
  			if (vacstmt->analyze)
  			{
  				MemoryContext old_context = NULL;
--- 426,433 ----
  			Oid			relid = lfirst_oid(cur);
  
  			if (vacstmt->vacuum)
! 				vacuum_rel(relid, vacstmt, false);
! 
  			if (vacstmt->analyze)
  			{
  				MemoryContext old_context = NULL;
***************
*** 496,507 ****
--- 472,487 ----
  	{
  		/* Make sure cost accounting is turned off after error */
  		VacuumCostActive = false;
+ 		/* reenable heap unfreezing too */
+ 		disable_heap_unfreeze = false;
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	/* Turn off vacuum cost accounting */
  	VacuumCostActive = false;
+ 	/* reenable heap unfreezing too */
+ 	disable_heap_unfreeze = false;
  
  	/*
  	 * Finish up processing.
***************
*** 534,550 ****
  		if (all_rels)
  			PrintFreeSpaceMapStatistics(elevel);
  
! 		/*
! 		 * If we completed a database-wide VACUUM without skipping any
! 		 * relations, update the database's pg_database row with info about
! 		 * the transaction IDs used, and try to truncate pg_clog.
! 		 */
! 		if (all_rels)
! 		{
! 			vac_update_dbstats(MyDatabaseId,
! 							   initialOldestXmin, initialFreezeLimit);
! 			vac_truncate_clog(initialOldestXmin, initialFreezeLimit);
! 		}
  	}
  
  	/*
--- 514,524 ----
  		if (all_rels)
  			PrintFreeSpaceMapStatistics(elevel);
  
! 		/* Update pg_database.datminxid and datvacuumxid */
! 		vac_update_dbminxid(MyDatabaseId);
! 
! 		/* Try to truncate pg_clog. */
! 		vac_truncate_clog();
  	}
  
  	/*
***************
*** 701,707 ****
   */
  void
  vac_update_relstats(Relation rel, BlockNumber num_pages, double num_tuples,
! 					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
--- 675,682 ----
   */
  void
  vac_update_relstats(Relation rel, BlockNumber num_pages, double num_tuples,
! 					bool hasindex, TransactionId minxid,
! 					TransactionId vacuumxid)
  {
  	Relation	rd;
  	HeapTuple	ctup;
***************
*** 709,832 ****
  	Form_pg_class pgcform;
  	bool		dirty;
  
! 	/* Process pg_ntclass first */
! 	rd = heap_open(NtClassRelationId, RowExclusiveLock);
  
! 	/* Fetch a copy of the pg_ntclass tuple to scribble on */
! 	ctup = get_ntclass_entry(rd, rel);
  
! 	pgntform = (Form_pg_ntclass) GETSTRUCT(ctup);
  
! 	/* Apply required updates, if any, to copied tuple */
  
! 	dirty = false;
! 	if (pgntform->relpages != (int32) num_pages)
! 	{
! 		pgntform->relpages = (int32) num_pages;
! 		dirty = true;
! 	}
! 	if (pgntform->reltuples != (float4) num_tuples)
! 	{
! 		pgntform->reltuples = (float4) num_tuples;
! 		dirty = true;
! 	}
  
! 	/* If anything changed, write out the tuple */
! 	if (dirty)
! 		update_ntclass_entry(rd, ctup);
! 	heap_freetuple(ctup);
  
! 	heap_close(rd, RowExclusiveLock);
  
! 	/* Now go for the pg_class tuple */
! 	rd = heap_open(RelationRelationId, RowExclusiveLock);
  
! 	/* Fetch a copy of the tuple to scribble on */
! 	ctup = SearchSysCacheCopy(RELOID,
! 							  ObjectIdGetDatum(RelationGetRelid(rel)),
! 							  0, 0, 0);
! 	if (!HeapTupleIsValid(ctup))
! 		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 			 RelationGetRelid(rel));
! 	pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
! 	/* Apply required updates, if any, to copied tuple */
  
! 	dirty = false;
! 	if (pgcform->relhasindex != hasindex)
! 	{
! 		pgcform->relhasindex = hasindex;
! 		dirty = true;
  	}
! 	/*
! 	 * If we have discovered that there are no indexes, then there's no
! 	 * primary key either.	This could be done more thoroughly...
! 	 */
! 	if (!hasindex)
  	{
! 		if (pgcform->relhaspkey)
! 		{
! 			pgcform->relhaspkey = false;
! 			dirty = true;
! 		}
  	}
  
! 	/*
! 	 * If anything changed, write out the tuple
! 	 */
! 	if (dirty)
! 		heap_inplace_update(rd, ctup);
! 
! 	heap_close(rd, RowExclusiveLock);
  }
  
- 
  /*
!  *	vac_update_dbstats() -- update statistics for one database
   *
!  *		Update the whole-database statistics that are kept in its pg_database
!  *		row, and the flat-file copy of pg_database.
   *
   *		We violate transaction semantics here by overwriting the database's
!  *		existing pg_database tuple with the new values.  This is reasonably
!  *		safe since the new values are correct whether or not this transaction
   *		commits.  As with vac_update_relstats, this avoids leaving dead tuples
   *		behind after a VACUUM.
   *
!  *		This routine is shared by full and lazy VACUUM.  Note that it is only
!  *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_update_dbstats(Oid dbid,
! 				   TransactionId vacuumXID,
! 				   TransactionId frozenXID)
  {
- 	Relation	relation;
  	HeapTuple	tuple;
  	Form_pg_database dbform;
  
! 	relation = heap_open(DatabaseRelationId, RowExclusiveLock);
  
! 	/* Fetch a copy of the tuple to scribble on */
! 	tuple = SearchSysCacheCopy(DATABASEOID,
! 							   ObjectIdGetDatum(dbid),
! 							   0, 0, 0);
! 	if (!HeapTupleIsValid(tuple))
! 		elog(ERROR, "could not find tuple for database %u", dbid);
! 	dbform = (Form_pg_database) GETSTRUCT(tuple);
! 
! 	/* overwrite the existing statistics in the tuple */
! 	dbform->datvacuumxid = vacuumXID;
! 	dbform->datfrozenxid = frozenXID;
  
! 	heap_inplace_update(relation, tuple);
  
! 	heap_close(relation, RowExclusiveLock);
  
! 	/* Mark the flat-file copy of pg_database for update at commit */
! 	database_file_update_needed();
! }
  
  
  /*
   *	vac_truncate_clog() -- attempt to truncate the commit log
--- 684,944 ----
  	Form_pg_class pgcform;
  	bool		dirty;
  
! 	PG_TRY();
! 	{
! 		/* Don't unfreeze pg_ntclass nor pg_class by doing this */
! 		disable_heap_unfreeze = true;
  
! 		/* Process pg_ntclass first */
! 		rd = heap_open(NtClassRelationId, RowExclusiveLock);
  
! 		/* Fetch a copy of the pg_ntclass tuple to scribble on */
! 		ctup = get_ntclass_entry(rd, rel);
  
! 		pgntform = (Form_pg_ntclass) GETSTRUCT(ctup);
  
! 		/* Apply required updates, if any, to copied tuple */
  
! 		dirty = false;
! 		if (pgntform->relpages != (int32) num_pages)
! 		{
! 			pgntform->relpages = (int32) num_pages;
! 			dirty = true;
! 		}
! 		if (pgntform->reltuples != (float4) num_tuples)
! 		{
! 			pgntform->reltuples = (float4) num_tuples;
! 			dirty = true;
! 		}
! 		if (pgntform->relminxid != minxid)
! 		{
! 			pgntform->relminxid = minxid;
! 			dirty = true;
! 		}
! 		/*
! 		 * If relminxid is Frozen (i.e., the table is truly frozen), it's more
! 		 * useful to mark it as having vacuumxid frozen as well.  This means
! 		 * tht this table does not impose any particular limit to pg_clog
! 		 * truncation.
! 		 *
! 		 * It seems a bit of a hack to be doing this here, but it would be even
! 		 * uglier to have all the callers do it.
! 		 */
! 		if (TransactionIdEquals(minxid, FrozenTransactionId))
! 			vacuumxid = FrozenTransactionId;
! 		if (pgntform->relvacuumxid != vacuumxid)
! 		{
! 			pgntform->relvacuumxid = vacuumxid;
! 			dirty = true;
! 		}
  
! 		/* If anything changed, write out the tuple */
! 		if (dirty)
! 			update_ntclass_entry(rd, ctup);
! 		heap_freetuple(ctup);
! 
! 		heap_close(rd, RowExclusiveLock);
! 
! 		/* Now go for the pg_class tuple */
! 		rd = heap_open(RelationRelationId, RowExclusiveLock);
! 
! 		/* Fetch a copy of the tuple to scribble on */
! 		ctup = SearchSysCacheCopy(RELOID,
! 								  ObjectIdGetDatum(RelationGetRelid(rel)),
! 								  0, 0, 0);
! 		if (!HeapTupleIsValid(ctup))
! 			elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 				 RelationGetRelid(rel));
! 		pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
! 		/* Apply required updates, if any, to copied tuple */
  
! 		dirty = false;
! 		if (pgcform->relhasindex != hasindex)
! 		{
! 			pgcform->relhasindex = hasindex;
! 			dirty = true;
! 		}
! 		/*
! 		 * If we have discovered that there are no indexes, then there's no
! 		 * primary key either.	This could be done more thoroughly...
! 		 */
! 		if (!hasindex)
! 		{
! 			if (pgcform->relhaspkey)
! 			{
! 				pgcform->relhaspkey = false;
! 				dirty = true;
! 			}
! 		}
  
! 		/*
! 		 * If anything changed, write out the tuple
! 		 */
! 		if (dirty)
! 			heap_inplace_update(rd, ctup);
  
! 		heap_close(rd, RowExclusiveLock);
! 		heap_freetuple(ctup);
  	}
! 	PG_CATCH();
  	{
! 		disable_heap_unfreeze = false;
! 		PG_RE_THROW();
  	}
+ 	PG_END_TRY();
  
! 	disable_heap_unfreeze = false;
  }
  
  /*
!  *	vac_update_dbminxid() -- update the minimum Xid present in one database
   *
!  * 		Update pg_database's datminxid and datvacuumxid, and the flat-file copy
!  * 		of it.  datminxid is updated to the minimum of all relminxid found in
!  * 		pg_ntclass.
   *
   *		We violate transaction semantics here by overwriting the database's
!  *		existing pg_database tuple with the new value.  This is reasonably
!  *		safe since the new value is correct whether or not this transaction
   *		commits.  As with vac_update_relstats, this avoids leaving dead tuples
   *		behind after a VACUUM.
   *
!  *		This routine is shared by full and lazy VACUUM.
   */
  static void
! vac_update_dbminxid(Oid dbid)
  {
  	HeapTuple	tuple;
  	Form_pg_database dbform;
+ 	Relation	relation;
+ 	SysScanDesc	scan;
+ 	HeapTuple	ntclassTup;
+ 	TransactionId	newMinXid = InvalidTransactionId;
+ 	TransactionId	newVacXid = InvalidTransactionId;
+ 	bool		dirty = false;
  
! 	PG_TRY();
! 	{
! 		/*
! 		 * Disable heap unfreezing of pg_database, since we are going
! 		 * to update the tuple in-place and won't be writing our Xid on it.
! 		 */
! 		disable_heap_unfreeze = true;
  
! 		/* We must seqscan pg_ntclass to find the minimum Xid */
! 		relation = heap_open(NtClassRelationId, AccessShareLock);
  
! 		scan = systable_beginscan(relation, InvalidOid, false,
! 								  SnapshotNow, 0, NULL);
  
! 		while ((ntclassTup = systable_getnext(scan)) != NULL)
! 		{
! 			Form_pg_ntclass ntclassForm;
  
! 			ntclassForm = (Form_pg_ntclass) GETSTRUCT(ntclassTup);
! 
! 			/*
! 			 * Compute the minimum relminxid in all the tables in the database.
! 			 * We consider only normal Xids --- this means in particular we
! 			 * avoid setting the minimum to FrozenTransactionId here.  If all
! 			 * tables turn out to be frozen, we will exit the loop with the
! 			 * value set to InvalidTransactionId.  We cannot allow newMinXid to
! 			 * be set to FrozenTransactionId --- that messes us up because of
! 			 * the semantics of TransactionIdPrecedes.
! 			 *
! 			 * Other values we are ignoring here are InvalidTransactionId
! 			 * (which is set for some bootstrap tables) and
! 			 * BootstrapTransactionId.  This doesn't cause any problems because
! 			 * for all practical purposes they behave exactly like
! 			 * FrozenTransactionId.
! 			 */
! 			if (TransactionIdIsValid(ntclassForm->relminxid) &&
! 				TransactionIdIsNormal(ntclassForm->relminxid) &&
! 				(!TransactionIdIsValid(newMinXid) ||
! 				 TransactionIdPrecedes(ntclassForm->relminxid, newMinXid)))
! 				newMinXid = ntclassForm->relminxid;
! 
! 			/* ditto, for relvacuumxid */
! 			if (TransactionIdIsValid(ntclassForm->relvacuumxid) &&
! 				TransactionIdIsNormal(ntclassForm->relvacuumxid) &&
! 				(!TransactionIdIsValid(newVacXid) ||
! 				 TransactionIdPrecedes(ntclassForm->relvacuumxid, newVacXid)))
! 				newVacXid = ntclassForm->relvacuumxid;
! 		}
! 
! 		/* we're done with pg_ntclass */
! 		systable_endscan(scan);
! 		heap_close(relation, AccessShareLock);
  
+ 		/*
+ 		 * If we got InvalidTransactionId, then all tables must be frozen.  As
+ 		 * a special protection, we only allow a database to be wholly marked
+ 		 * as "frozen" if this is a standalone backend.  Otherwise, some other
+ 		 * backend may be modifying a table behind our back; we can't safely
+ 		 * assume that the database is truly frozen.  So if we detect that all
+ 		 * tables are frozen but we're running on a regular backend, fall back
+ 		 * to storing RecentXmin in datminxid (the minimum Xid which could be
+ 		 * unfreezing a table simultaneously.)
+ 		 */
+ 		if (!TransactionIdIsValid(newMinXid))
+ 			newMinXid = IsPostmasterEnvironment ? RecentXmin :
+ 				FrozenTransactionId;
+ 
+ 		/*
+ 		 * In datvacuumxid, if we got InvalidTransactionId we use
+ 		 * FrozenTransactionId in pg_database.  This case is not valid as
+ 		 * truncation point for pg_clog, but it's handled specially in
+ 		 * vac_truncate_clog() because it's useful as a permanent sign
+ 		 * that this database doesn't have a true lower limit on truncation.
+ 		 */
+ 		if (!TransactionIdIsValid(newVacXid))
+ 			newVacXid = FrozenTransactionId;
+ 
+ 		/* Now fetch the pg_database tuple we need to update. */
+ 		relation = heap_open(DatabaseRelationId, RowExclusiveLock);
+ 
+ 		/* Fetch a copy of the tuple to scribble on */
+ 		tuple = SearchSysCacheCopy(DATABASEOID,
+ 								   ObjectIdGetDatum(dbid),
+ 								   0, 0, 0);
+ 
+ 		if (!HeapTupleIsValid(tuple))
+ 			elog(ERROR, "could not find tuple for database %u", dbid);
+ 
+ 		dbform = (Form_pg_database) GETSTRUCT(tuple);
+ 
+ 		if (TransactionIdPrecedes(dbform->datminxid, newMinXid) ||
+ 			TransactionIdEquals(newMinXid, FrozenTransactionId))
+ 		{
+ 			dbform->datminxid = newMinXid;
+ 			dirty = true;
+ 		}
+ 		if (TransactionIdPrecedes(dbform->datvacuumxid, newVacXid) ||
+ 			TransactionIdEquals(newVacXid, FrozenTransactionId))
+ 		{
+ 			dbform->datvacuumxid = newVacXid;
+ 			dirty = true;
+ 		}
+ 
+ 		if (dirty)
+ 			heap_inplace_update(relation, tuple);
+ 
+ 		heap_freetuple(tuple);
+ 
+ 		heap_close(relation, RowExclusiveLock);
+ 	}
+ 	PG_CATCH();
+ 	{
+ 		/* Make sure to reenable heap unfreezing in case of error */
+ 		disable_heap_unfreeze = false;
+ 		PG_RE_THROW();
+ 	}
+ 	PG_END_TRY();
+ 
+ 	/* reenable heap unfreezing */
+ 	disable_heap_unfreeze = false;
+ }
  
  /*
   *	vac_truncate_clog() -- attempt to truncate the commit log
***************
*** 841,868 ****
   *		will generate more-annoying warnings, and ultimately refuse to issue
   *		any more new XIDs.
   *
-  *		The passed XIDs are simply the ones I just wrote into my pg_database
-  *		entry.	They're used to initialize the "min" calculations.
-  *
   *		This routine is shared by full and lazy VACUUM.  Note that it is only
   *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_truncate_clog(TransactionId vacuumXID, TransactionId frozenXID)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Relation	relation;
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	int32		age;
  	NameData	oldest_datname;
  	bool		vacuumAlreadyWrapped = false;
! 	bool		frozenAlreadyWrapped = false;
  
! 	/* init oldest_datname to sync with my frozenXID */
  	namestrcpy(&oldest_datname, get_database_name(MyDatabaseId));
  
  	/*
  	 * Note: the "already wrapped" cases should now be impossible due to the
  	 * defenses in GetNewTransactionId, but we keep them anyway.
  	 */
--- 953,988 ----
   *		will generate more-annoying warnings, and ultimately refuse to issue
   *		any more new XIDs.
   *
   *		This routine is shared by full and lazy VACUUM.  Note that it is only
   *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_truncate_clog(void)
  {
  	TransactionId myXID = GetCurrentTransactionId();
+ 	TransactionId minXID;
+ 	TransactionId vacuumXID;
  	Relation	relation;
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	int32		age;
  	NameData	oldest_datname;
  	bool		vacuumAlreadyWrapped = false;
! 	bool		minAlreadyWrapped = false;
  
! 	/*
! 	 * Initialize the minimum values to a recent value.
! 	 */
! 	minXID = vacuumXID = RecentXmin;
  	namestrcpy(&oldest_datname, get_database_name(MyDatabaseId));
  
  	/*
+ 	 * Note we don't initialize the oldest database name here.  This is because
+ 	 * the name will only be used if myXID - minXID is some positive quantity,
+ 	 * and if that happens, we will also initialize the name in the loop below.
+ 	 */
+ 
+ 	/*
  	 * Note: the "already wrapped" cases should now be impossible due to the
  	 * defenses in GetNewTransactionId, but we keep them anyway.
  	 */
***************
*** 874,884 ****
  	{
  		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 		/* Ignore non-connectable databases (eg, template0) */
! 		/* It's assumed that these have been frozen correctly */
! 		if (!dbform->datallowconn)
! 			continue;
! 
  		if (TransactionIdIsNormal(dbform->datvacuumxid))
  		{
  			if (TransactionIdPrecedes(myXID, dbform->datvacuumxid))
--- 994,1009 ----
  	{
  		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 		/*
! 		 * Note we ignore FrozenTransactionId here for both values.  If all
! 		 * databases turn out to be frozen, the values will end up as the
! 		 * current XID, which is the correct truncation point for pg_clog and
! 		 * also the correct value for the varsup.c limit.
! 		 *
! 		 * Also note that the all-databases-are-frozen case is pretty rare.
! 		 * It can only happen if the user VACUUM FREEZEs all databases using
! 		 * standalone backends.
! 		 */
  		if (TransactionIdIsNormal(dbform->datvacuumxid))
  		{
  			if (TransactionIdPrecedes(myXID, dbform->datvacuumxid))
***************
*** 886,898 ****
  			else if (TransactionIdPrecedes(dbform->datvacuumxid, vacuumXID))
  				vacuumXID = dbform->datvacuumxid;
  		}
! 		if (TransactionIdIsNormal(dbform->datfrozenxid))
  		{
! 			if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
! 				frozenAlreadyWrapped = true;
! 			else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
  			{
! 				frozenXID = dbform->datfrozenxid;
  				namecpy(&oldest_datname, &dbform->datname);
  			}
  		}
--- 1011,1023 ----
  			else if (TransactionIdPrecedes(dbform->datvacuumxid, vacuumXID))
  				vacuumXID = dbform->datvacuumxid;
  		}
! 		if (TransactionIdIsNormal(dbform->datminxid))
  		{
! 			if (TransactionIdPrecedes(myXID, dbform->datminxid))
! 				minAlreadyWrapped = true;
! 			else if (TransactionIdPrecedes(dbform->datminxid, minXID))
  			{
! 				minXID = dbform->datminxid;
  				namecpy(&oldest_datname, &dbform->datname);
  			}
  		}
***************
*** 921,927 ****
  	 * Do not update varsup.c if we seem to have suffered wraparound already;
  	 * the computed XID might be bogus.
  	 */
! 	if (frozenAlreadyWrapped)
  	{
  		ereport(WARNING,
  				(errmsg("some databases have not been vacuumed in over 1 billion transactions"),
--- 1046,1052 ----
  	 * Do not update varsup.c if we seem to have suffered wraparound already;
  	 * the computed XID might be bogus.
  	 */
! 	if (minAlreadyWrapped)
  	{
  		ereport(WARNING,
  				(errmsg("some databases have not been vacuumed in over 1 billion transactions"),
***************
*** 930,939 ****
  	}
  
  	/* Update the wrap limit for GetNewTransactionId */
! 	SetTransactionIdLimit(frozenXID, &oldest_datname);
  
  	/* Give warning about impending wraparound problems */
! 	age = (int32) (myXID - frozenXID);
  	if (age > (int32) ((MaxTransactionId >> 3) * 3))
  		ereport(WARNING,
  		   (errmsg("database \"%s\" must be vacuumed within %u transactions",
--- 1055,1064 ----
  	}
  
  	/* Update the wrap limit for GetNewTransactionId */
! 	SetTransactionIdLimit(minXID, &oldest_datname);
  
  	/* Give warning about impending wraparound problems */
! 	age = (int32) (myXID - minXID);
  	if (age > (int32) ((MaxTransactionId >> 3) * 3))
  		ereport(WARNING,
  		   (errmsg("database \"%s\" must be vacuumed within %u transactions",
***************
*** 955,965 ****
  /*
   *	vacuum_rel() -- vacuum one heap relation
   *
-  *		Returns TRUE if we actually processed the relation (or can ignore it
-  *		for some reason), FALSE if we failed to process it due to permissions
-  *		or other reasons.  (A FALSE result really means that some data
-  *		may have been left unvacuumed, so we can't update XID stats.)
-  *
   *		Doing one heap at a time incurs extra overhead, since we need to
   *		check that the heap exists again just before we vacuum it.	The
   *		reason that we do this is so that vacuuming can be spread across
--- 1080,1085 ----
***************
*** 968,981 ****
   *
   *		At entry and exit, we are not inside a transaction.
   */
! static bool
  vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
  	LockRelId	onerelid;
  	Oid			toast_relid;
- 	bool		result;
  
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
--- 1088,1100 ----
   *
   *		At entry and exit, we are not inside a transaction.
   */
! static void
  vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
  	LockRelId	onerelid;
  	Oid			toast_relid;
  
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
***************
*** 1004,1021 ****
  	{
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return true;			/* okay 'cause no data there */
  	}
  
  	/*
  	 * Determine the type of lock we want --- hard exclusive lock for a FULL
! 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
! 	 * way, we can be sure that no other backend is vacuuming the same table.
  	 */
! 	lmode = vacstmt->full ? AccessExclusiveLock : ShareUpdateExclusiveLock;
  
  	/*
! 	 * Open the class, get an appropriate lock on it, and check permissions.
  	 *
  	 * We allow the user to vacuum a table if he is superuser, the table
  	 * owner, or the database owner (but in the latter case, only if it's not
--- 1123,1164 ----
  	{
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
  	 * Determine the type of lock we want --- hard exclusive lock for a FULL
! 	 * vacuum, ExclusiveLock for VACUUM FREEZE, but just
! 	 * ShareUpdateExclusiveLock for concurrent vacuum. Either way, we can be
! 	 * sure that no other backend is vacuuming the same table.
! 	 */
! 	lmode = vacstmt->full ? AccessExclusiveLock :
! 		vacstmt->freeze ? ExclusiveLock : ShareUpdateExclusiveLock;
! 
! 	/*
! 	 * Open the class and get an appropriate lock on it.
  	 */
! 	PG_TRY();
! 	{
! 		/*
! 		 * Note that we can skip unfreezing the table at this moment, because
! 		 * we will update the correct relminxid later, if needed.  Furthermore,
! 		 * doing this allows us to skip vacuuming the table at all if it has
! 		 * been frozen and not modified since.
! 		 */
! 		disable_heap_unfreeze = true;
! 		onerel = relation_open(relid, lmode);
! 	}
! 	PG_CATCH();
! 	{
! 		disable_heap_unfreeze = false;
! 		PG_RE_THROW();
! 	}
! 	PG_END_TRY();
! 	disable_heap_unfreeze = false;
  
  	/*
! 	 * Check permissions on the table.
  	 *
  	 * We allow the user to vacuum a table if he is superuser, the table
  	 * owner, or the database owner (but in the latter case, only if it's not
***************
*** 1024,1031 ****
  	 * Note we choose to treat permissions failure as a WARNING and keep
  	 * trying to vacuum the rest of the DB --- is this appropriate?
  	 */
- 	onerel = relation_open(relid, lmode);
- 
  	if (!(pg_class_ownercheck(RelationGetRelid(onerel), GetUserId()) ||
  		  (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !onerel->rd_rel->relisshared)))
  	{
--- 1167,1172 ----
***************
*** 1035,1041 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return false;
  	}
  
  	/*
--- 1176,1182 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1053,1059 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return false;
  	}
  
  	/*
--- 1194,1200 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1069,1075 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return false;
  	}
  
  	/*
--- 1210,1216 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1084,1090 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return true;			/* assume no long-lived data in temp tables */
  	}
  
  	/*
--- 1225,1231 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;			/* assume no long-lived data in temp tables */
  	}
  
  	/*
***************
*** 1113,1120 ****
  	else
  		lazy_vacuum_rel(onerel, vacstmt);
  
- 	result = true;				/* did the vacuum */
- 
  	/* all done with this class, but hold lock until commit */
  	relation_close(onerel, NoLock);
  
--- 1254,1259 ----
***************
*** 1132,1148 ****
  	 * totally unimportant for toast relations.
  	 */
  	if (toast_relid != InvalidOid)
! 	{
! 		if (!vacuum_rel(toast_relid, vacstmt, true))
! 			result = false;		/* failed to vacuum the TOAST table? */
! 	}
  
  	/*
  	 * Now release the session-level lock on the master table.
  	 */
  	UnlockRelationForSession(&onerelid, lmode);
- 
- 	return result;
  }
  
  
--- 1271,1282 ----
  	 * totally unimportant for toast relations.
  	 */
  	if (toast_relid != InvalidOid)
! 		vacuum_rel(toast_relid, vacstmt, true);
  
  	/*
  	 * Now release the session-level lock on the master table.
  	 */
  	UnlockRelationForSession(&onerelid, lmode);
  }
  
  
***************
*** 1174,1179 ****
--- 1308,1315 ----
  	int			nindexes,
  				i;
  	VRelStats  *vacrelstats;
+ 	TransactionId FreezeLimit,
+ 				  OldestXmin;
  
  	vacuum_set_xid_limits(vacstmt, onerel->rd_rel->relisshared,
  						  &OldestXmin, &FreezeLimit);
***************
*** 1186,1194 ****
  	vacrelstats->rel_tuples = 0;
  	vacrelstats->hasindex = false;
  
  	/* scan the heap */
  	vacuum_pages.num_pages = fraged_pages.num_pages = 0;
! 	scan_heap(vacrelstats, onerel, &vacuum_pages, &fraged_pages);
  
  	/* Now open all indexes of the relation */
  	vac_open_indexes(onerel, AccessExclusiveLock, &nindexes, &Irel);
--- 1322,1342 ----
  	vacrelstats->rel_tuples = 0;
  	vacrelstats->hasindex = false;
  
+ 	/*
+ 	 * Set initial minimum Xid, which will be updated if a smaller Xid is found
+ 	 * in the relation by scan_heap.
+ 	 *
+ 	 * We use RecentXmin here (the minimum Xid that belongs to a transaction
+ 	 * that is still open according to our snapshot), because it is the
+ 	 * earliest transaction that could insert new tuples in the table after our
+ 	 * VACUUM is done.
+ 	 */
+ 	vacrelstats->minxid = RecentXmin;
+ 
  	/* scan the heap */
  	vacuum_pages.num_pages = fraged_pages.num_pages = 0;
! 	scan_heap(vacrelstats, onerel, &vacuum_pages, &fraged_pages, FreezeLimit,
! 			  OldestXmin);
  
  	/* Now open all indexes of the relation */
  	vac_open_indexes(onerel, AccessExclusiveLock, &nindexes, &Irel);
***************
*** 1216,1222 ****
  	{
  		/* Try to shrink heap */
  		repair_frag(vacrelstats, onerel, &vacuum_pages, &fraged_pages,
! 					nindexes, Irel);
  		vac_close_indexes(nindexes, Irel, NoLock);
  	}
  	else
--- 1364,1370 ----
  	{
  		/* Try to shrink heap */
  		repair_frag(vacrelstats, onerel, &vacuum_pages, &fraged_pages,
! 					nindexes, Irel, OldestXmin);
  		vac_close_indexes(nindexes, Irel, NoLock);
  	}
  	else
***************
*** 1234,1240 ****
  
  	/* update statistics in pg_class */
  	vac_update_relstats(onerel, vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
--- 1382,1389 ----
  
  	/* update statistics in pg_class */
  	vac_update_relstats(onerel, vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, vacrelstats->hasindex,
! 						vacrelstats->minxid, OldestXmin);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
***************
*** 1248,1263 ****
   *		This routine sets commit status bits, constructs vacuum_pages (list
   *		of pages we need to compact free space on and/or clean indexes of
   *		deleted tuples), constructs fraged_pages (list of pages with free
!  *		space that tuples could be moved into), and calculates statistics
!  *		on the number of live tuples in the heap.
   */
  static void
  scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages)
  {
  	BlockNumber nblocks,
  				blkno;
- 	HeapTupleData tuple;
  	char	   *relname;
  	VacPage		vacpage;
  	BlockNumber empty_pages,
--- 1397,1413 ----
   *		This routine sets commit status bits, constructs vacuum_pages (list
   *		of pages we need to compact free space on and/or clean indexes of
   *		deleted tuples), constructs fraged_pages (list of pages with free
!  *		space that tuples could be moved into), calculates statistics on the
!  *		number of live tuples in the heap, and figures out the minimum normal
!  *		Xid present anywhere on the table.
   */
  static void
  scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages,
! 		  TransactionId FreezeLimit, TransactionId OldestXmin)
  {
  	BlockNumber nblocks,
  				blkno;
  	char	   *relname;
  	VacPage		vacpage;
  	BlockNumber empty_pages,
***************
*** 1371,1376 ****
--- 1521,1527 ----
  		{
  			ItemId		itemid = PageGetItemId(page, offnum);
  			bool		tupgone = false;
+ 			HeapTupleData tuple;
  
  			/*
  			 * Collect un-used items too - it's possible to have indexes
***************
*** 1512,1517 ****
--- 1663,1691 ----
  					min_tlen = tuple.t_len;
  				if (tuple.t_len > max_tlen)
  					max_tlen = tuple.t_len;
+ 
+ 				/*
+ 				 * Checks for pg_class.relminxid: determine the earliest
+ 				 * Xid in any tuple of any table.
+ 				 */
+ 				if (TransactionIdIsNormal(HeapTupleHeaderGetXmin(tuple.t_data)) &&
+ 					TransactionIdPrecedes(HeapTupleHeaderGetXmin(tuple.t_data),
+ 										  vacrelstats->minxid))
+ 					vacrelstats->minxid = HeapTupleHeaderGetXmin(tuple.t_data);
+ 
+ 				/*
+ 				 * If XMAX is not marked INVALID, we assume it's valid without
+ 				 * making any check on it --- it must be recently obsoleted or
+ 				 * still running, else HeapTupleSatisfiesVacuum would have
+ 				 * deemed it removable.
+ 				 */
+ 				if (!(tuple.t_data->t_infomask | HEAP_XMAX_INVALID))
+ 				{
+ 					if (TransactionIdIsNormal(HeapTupleHeaderGetXmax(tuple.t_data)) &&
+ 						TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple.t_data),
+ 											  vacrelstats->minxid))
+ 						vacrelstats->minxid = HeapTupleHeaderGetXmax(tuple.t_data);
+ 				}
  			}
  		}						/* scan along page */
  
***************
*** 1651,1657 ****
  static void
  repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Buffer		dst_buffer = InvalidBuffer;
--- 1825,1831 ----
  static void
  repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel, TransactionId OldestXmin)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Buffer		dst_buffer = InvalidBuffer;
***************
*** 2994,3000 ****
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 3168,3174 ----
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
***************
*** 3063,3069 ****
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 3237,3243 ----
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
diff -rc -X diff-ignore 11fixclass/src/backend/commands/vacuumlazy.c 12ntrelminxid/src/backend/commands/vacuumlazy.c
*** 11fixclass/src/backend/commands/vacuumlazy.c	2006-06-10 19:17:47.000000000 -0400
--- 12ntrelminxid/src/backend/commands/vacuumlazy.c	2006-06-11 02:41:44.000000000 -0400
***************
*** 42,47 ****
--- 42,50 ----
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
+ #include "catalog/catalog.h"
+ #include "catalog/heap.h"
+ #include "catalog/pg_ntclass.h"
  #include "commands/vacuum.h"
  #include "miscadmin.h"
  #include "pgstat.h"
***************
*** 72,77 ****
--- 75,81 ----
  	double		tuples_deleted;
  	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
  	Size		threshold;		/* minimum interesting free space */
+ 	TransactionId minxid;		/* minimum Xid present anywhere in table */
  	/* List of TIDs of tuples we intend to delete */
  	/* NB: this list is ordered by TID address */
  	int			num_dead_tuples;	/* current # of entries */
***************
*** 88,100 ****
  
  static int	elevel = -1;
  
- static TransactionId OldestXmin;
- static TransactionId FreezeLimit;
- 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
  static void lazy_vacuum_index(Relation indrel,
  							  IndexBulkDeleteResult **stats,
--- 92,102 ----
  
  static int	elevel = -1;
  
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes, TransactionId FreezeLimit,
! 			   TransactionId OldestXmin);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
  static void lazy_vacuum_index(Relation indrel,
  							  IndexBulkDeleteResult **stats,
***************
*** 104,112 ****
  							   LVRelStats *vacrelstats);
  static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
  				 int tupindex, LVRelStats *vacrelstats);
! static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
  static BlockNumber count_nondeletable_pages(Relation onerel,
! 						 LVRelStats *vacrelstats);
  static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
  static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
  					   ItemPointer itemptr);
--- 106,115 ----
  							   LVRelStats *vacrelstats);
  static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
  				 int tupindex, LVRelStats *vacrelstats);
! static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats,
! 							   TransactionId OldestXmin);
  static BlockNumber count_nondeletable_pages(Relation onerel,
! 						 LVRelStats *vacrelstats, TransactionId OldestXmin);
  static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
  static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
  					   ItemPointer itemptr);
***************
*** 122,128 ****
   *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
   *
   *		This routine vacuums a single heap, cleans out its indexes, and
!  *		updates its num_pages and num_tuples statistics.
   *
   *		At entry, we have already established a transaction and opened
   *		and locked the relation.
--- 125,132 ----
   *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
   *
   *		This routine vacuums a single heap, cleans out its indexes, and
!  *		updates its relpages and reltuples statistics, as well as the
!  *		relminxid and relvacuumxid information.
   *
   *		At entry, we have already established a transaction and opened
   *		and locked the relation.
***************
*** 135,146 ****
--- 139,177 ----
  	int			nindexes;
  	bool		hasindex;
  	BlockNumber possibly_freeable;
+ 	TransactionId OldestXmin,
+ 				  FreezeLimit;
  
  	if (vacstmt->verbose)
  		elevel = INFO;
  	else
  		elevel = DEBUG2;
  
+ 	/*
+ 	 * We can skip vacuuming a frozen table, since we know nobody has touched
+ 	 * it since the last VACUUM.
+ 	 */
+ 	{
+ 		HeapTuple	ctup;
+ 		Form_pg_ntclass ntForm;
+ 		Relation	ntRel;
+ 		bool		canskip = false;
+ 
+ 		ntRel = heap_open(NtClassRelationId, AccessShareLock);
+ 		ctup = get_ntclass_entry(ntRel, onerel);
+ 
+ 		ntForm = (Form_pg_ntclass) GETSTRUCT(ctup);
+ 
+ 		if (TransactionIdEquals(ntForm->relminxid, FrozenTransactionId))
+ 			canskip = true;
+ 
+ 		heap_freetuple(ctup);
+ 		heap_close(ntRel, AccessShareLock);
+ 
+ 		if (canskip)
+ 			return;
+ 	}
+ 
  	vacuum_set_xid_limits(vacstmt, onerel->rd_rel->relisshared,
  						  &OldestXmin, &FreezeLimit);
  
***************
*** 150,161 ****
  	/* XXX should we scale it up or down?  Adjust vacuum.c too, if so */
  	vacrelstats->threshold = GetAvgFSMRequestSize(&onerel->rd_node);
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, ShareUpdateExclusiveLock, &nindexes, &Irel);
  	hasindex = (nindexes > 0);
  
  	/* Do the vacuuming */
! 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes);
  
  	/* Done with indexes */
  	vac_close_indexes(nindexes, Irel, NoLock);
--- 181,214 ----
  	/* XXX should we scale it up or down?  Adjust vacuum.c too, if so */
  	vacrelstats->threshold = GetAvgFSMRequestSize(&onerel->rd_node);
  
+ 	/*
+ 	 * Set initial minimum Xid, which will be updated if a smaller Xid is found
+ 	 * in the relation by lazy_scan_heap.
+ 	 *
+ 	 * In VACUUM FREEZE, we use FrozenTransactionId here.  This is safe
+ 	 * because we acquired ExclusiveLock above, so no one can be inserting 
+ 	 * newer tuples in pages earlier to those we have scanned.  If there's any
+ 	 * tuple whose Xid we can't change, the lower bound will be raised.
+ 	 *
+ 	 * In the non-FREEZE case, we use RecentXmin here (the minimum Xid that
+ 	 * belongs to a transaction that is still open according to our snapshot),
+ 	 * because it is the earliest transaction that could concurrently insert
+ 	 * new tuples in the table.
+ 	 *
+ 	 * The FREEZE case doesn't have an equivalent in VACUUM FULL because FULL
+ 	 * in combination with FREEZE is verboten.
+ 	 */
+ 	if (vacstmt->freeze)
+ 		vacrelstats->minxid = FrozenTransactionId;
+ 	else
+ 		vacrelstats->minxid = RecentXmin;
+ 
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, ShareUpdateExclusiveLock, &nindexes, &Irel);
  	hasindex = (nindexes > 0);
  
  	/* Do the vacuuming */
! 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, FreezeLimit, OldestXmin);
  
  	/* Done with indexes */
  	vac_close_indexes(nindexes, Irel, NoLock);
***************
*** 169,175 ****
  	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
  	if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
  		possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)
! 		lazy_truncate_heap(onerel, vacrelstats);
  
  	/* Update shared free space map with final free space info */
  	lazy_update_fsm(onerel, vacrelstats);
--- 222,228 ----
  	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
  	if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
  		possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)
! 		lazy_truncate_heap(onerel, vacrelstats, OldestXmin);
  
  	/* Update shared free space map with final free space info */
  	lazy_update_fsm(onerel, vacrelstats);
***************
*** 177,184 ****
  	/* Update statistics in pg_class */
  	vac_update_relstats(onerel,
  						vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples,
! 						hasindex);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
--- 230,237 ----
  	/* Update statistics in pg_class */
  	vac_update_relstats(onerel,
  						vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, hasindex,
! 						vacrelstats->minxid, OldestXmin);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
***************
*** 193,202 ****
   *		and pages with free space, and calculates statistics on the number
   *		of live tuples in the heap.  When done, or when we run low on space
   *		for dead-tuple TIDs, invoke vacuuming of indexes and heap.
   */
  static void
  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes)
  {
  	BlockNumber nblocks,
  				blkno;
--- 246,259 ----
   *		and pages with free space, and calculates statistics on the number
   *		of live tuples in the heap.  When done, or when we run low on space
   *		for dead-tuple TIDs, invoke vacuuming of indexes and heap.
+  *
+  *		It also stores in vacrelstats.minxid the minimum Xid found anywhere on
+  *		the table, for later recording it in pg_ntclass.relminxid.
   */
  static void
  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes, TransactionId FreezeLimit,
! 			   TransactionId OldestXmin)
  {
  	BlockNumber nblocks,
  				blkno;
***************
*** 408,413 ****
--- 465,484 ----
  			{
  				num_tuples += 1;
  				hastup = true;
+ 
+ 				/* 
+ 				 * If the tuple is alive, we consider it for the "minxid"
+ 				 * calculations.
+ 				 */
+ 				if (TransactionIdIsNormal(HeapTupleHeaderGetXmin(tuple.t_data)) &&
+ 					TransactionIdPrecedes(HeapTupleHeaderGetXmin(tuple.t_data),
+ 										  vacrelstats->minxid))
+ 					vacrelstats->minxid = HeapTupleHeaderGetXmin(tuple.t_data);
+ 
+ 				if (TransactionIdIsNormal(HeapTupleHeaderGetXmax(tuple.t_data)) &&
+ 					TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple.t_data),
+ 										  vacrelstats->minxid))
+ 					vacrelstats->minxid = HeapTupleHeaderGetXmax(tuple.t_data);
  			}
  		}						/* scan along page */
  
***************
*** 668,676 ****
  
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
! 						stats->num_pages,
! 						stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 739,746 ----
  
  	/* now update statistics in pg_class */
  	vac_update_relstats(indrel,
! 						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
***************
*** 691,697 ****
   * lazy_truncate_heap - try to truncate off any empty pages at the end
   */
  static void
! lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
  {
  	BlockNumber old_rel_pages = vacrelstats->rel_pages;
  	BlockNumber new_rel_pages;
--- 761,768 ----
   * lazy_truncate_heap - try to truncate off any empty pages at the end
   */
  static void
! lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats,
! 				   TransactionId OldestXmin)
  {
  	BlockNumber old_rel_pages = vacrelstats->rel_pages;
  	BlockNumber new_rel_pages;
***************
*** 732,738 ****
  	 * because other backends could have added tuples to these pages whilst we
  	 * were vacuuming.
  	 */
! 	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  
  	if (new_rel_pages >= old_rel_pages)
  	{
--- 803,809 ----
  	 * because other backends could have added tuples to these pages whilst we
  	 * were vacuuming.
  	 */
! 	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats, OldestXmin);
  
  	if (new_rel_pages >= old_rel_pages)
  	{
***************
*** 787,793 ****
   * Returns number of nondeletable pages (last nonempty page + 1).
   */
  static BlockNumber
! count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
  {
  	BlockNumber blkno;
  	HeapTupleData tuple;
--- 858,865 ----
   * Returns number of nondeletable pages (last nonempty page + 1).
   */
  static BlockNumber
! count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats,
! 						 TransactionId OldestXmin)
  {
  	BlockNumber blkno;
  	HeapTupleData tuple;
diff -rc -X diff-ignore 11fixclass/src/backend/libpq/hba.c 12ntrelminxid/src/backend/libpq/hba.c
*** 11fixclass/src/backend/libpq/hba.c	2006-03-08 16:23:25.000000000 -0300
--- 12ntrelminxid/src/backend/libpq/hba.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 1005,1011 ****
   *	dbname: gets database name (must be of size NAMEDATALEN bytes)
   *	dboid: gets database OID
   *	dbtablespace: gets database's default tablespace's OID
!  *	dbfrozenxid: gets database's frozen XID
   *	dbvacuumxid: gets database's vacuum XID
   *
   * This is not much related to the other functions in hba.c, but we put it
--- 1005,1011 ----
   *	dbname: gets database name (must be of size NAMEDATALEN bytes)
   *	dboid: gets database OID
   *	dbtablespace: gets database's default tablespace's OID
!  *	dbminxid: gets database's minimum XID
   *	dbvacuumxid: gets database's vacuum XID
   *
   * This is not much related to the other functions in hba.c, but we put it
***************
*** 1013,1019 ****
   */
  bool
  read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbfrozenxid,
  					  TransactionId *dbvacuumxid)
  {
  	char		buf[MAX_TOKEN];
--- 1013,1019 ----
   */
  bool
  read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbminxid,
  					  TransactionId *dbvacuumxid)
  {
  	char		buf[MAX_TOKEN];
***************
*** 1036,1042 ****
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
! 	*dbfrozenxid = atoxid(buf);
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
--- 1036,1042 ----
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
! 	*dbminxid = atoxid(buf);
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
diff -rc -X diff-ignore 11fixclass/src/backend/postmaster/autovacuum.c 12ntrelminxid/src/backend/postmaster/autovacuum.c
*** 11fixclass/src/backend/postmaster/autovacuum.c	2006-06-10 19:17:47.000000000 -0400
--- 12ntrelminxid/src/backend/postmaster/autovacuum.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 80,86 ****
  {
  	Oid			oid;
  	char	   *name;
! 	TransactionId frozenxid;
  	TransactionId vacuumxid;
  	PgStat_StatDBEntry *entry;
  	int32		age;
--- 80,86 ----
  {
  	Oid			oid;
  	char	   *name;
! 	TransactionId minxid;
  	TransactionId vacuumxid;
  	PgStat_StatDBEntry *entry;
  	int32		age;
***************
*** 350,356 ****
  	{
  		autovac_dbase *tmp = lfirst(cell);
  		bool		this_whole_db;
! 		int32		freeze_age,
  					vacuum_age;
  
  		/*
--- 350,356 ----
  	{
  		autovac_dbase *tmp = lfirst(cell);
  		bool		this_whole_db;
! 		int32		true_age,
  					vacuum_age;
  
  		/*
***************
*** 363,371 ****
  		 * Unlike vacuum.c, we also look at vacuumxid.	This is so that
  		 * pg_clog can be kept trimmed to a reasonable size.
  		 */
! 		freeze_age = (int32) (nextXid - tmp->frozenxid);
  		vacuum_age = (int32) (nextXid - tmp->vacuumxid);
! 		tmp->age = Max(freeze_age, vacuum_age);
  
  		this_whole_db = (tmp->age >
  						 (int32) ((MaxTransactionId >> 3) * 3 - 100000));
--- 363,371 ----
  		 * Unlike vacuum.c, we also look at vacuumxid.	This is so that
  		 * pg_clog can be kept trimmed to a reasonable size.
  		 */
! 		true_age = (int32) (nextXid - tmp->minxid);
  		vacuum_age = (int32) (nextXid - tmp->vacuumxid);
! 		tmp->age = Max(true_age, vacuum_age);
  
  		this_whole_db = (tmp->age >
  						 (int32) ((MaxTransactionId >> 3) * 3 - 100000));
***************
*** 456,462 ****
  	FILE	   *db_file;
  	Oid			db_id;
  	Oid			db_tablespace;
! 	TransactionId db_frozenxid;
  	TransactionId db_vacuumxid;
  
  	filename = database_getflatfilename();
--- 456,462 ----
  	FILE	   *db_file;
  	Oid			db_id;
  	Oid			db_tablespace;
! 	TransactionId db_minxid;
  	TransactionId db_vacuumxid;
  
  	filename = database_getflatfilename();
***************
*** 467,473 ****
  				 errmsg("could not open file \"%s\": %m", filename)));
  
  	while (read_pg_database_line(db_file, thisname, &db_id,
! 								 &db_tablespace, &db_frozenxid,
  								 &db_vacuumxid))
  	{
  		autovac_dbase *db;
--- 467,473 ----
  				 errmsg("could not open file \"%s\": %m", filename)));
  
  	while (read_pg_database_line(db_file, thisname, &db_id,
! 								 &db_tablespace, &db_minxid,
  								 &db_vacuumxid))
  	{
  		autovac_dbase *db;
***************
*** 476,482 ****
  
  		db->oid = db_id;
  		db->name = pstrdup(thisname);
! 		db->frozenxid = db_frozenxid;
  		db->vacuumxid = db_vacuumxid;
  		/* these get set later: */
  		db->entry = NULL;
--- 476,482 ----
  
  		db->oid = db_id;
  		db->name = pstrdup(thisname);
! 		db->minxid = db_minxid;
  		db->vacuumxid = db_vacuumxid;
  		/* these get set later: */
  		db->entry = NULL;
diff -rc -X diff-ignore 11fixclass/src/backend/storage/lmgr/lmgr.c 12ntrelminxid/src/backend/storage/lmgr/lmgr.c
*** 11fixclass/src/backend/storage/lmgr/lmgr.c	2006-05-04 21:36:11.000000000 -0400
--- 12ntrelminxid/src/backend/storage/lmgr/lmgr.c	2006-06-10 22:42:34.000000000 -0400
***************
*** 15,20 ****
--- 15,21 ----
  
  #include "postgres.h"
  
+ #include "access/heapam.h"
  #include "access/subtrans.h"
  #include "access/transam.h"
  #include "access/xact.h"
***************
*** 53,58 ****
--- 54,66 ----
  	LOCKTAG		tag;
  	LockAcquireResult res;
  
+ 	/*
+ 	 * If somebody tries to lock a relation for more than a simple SELECT,
+ 	 * unfreeze it.
+ 	 */
+ 	if (lockmode > AccessShareLock)
+ 		heap_unfreeze(relation);
+ 
  	SET_LOCKTAG_RELATION(tag,
  						 relation->rd_lockInfo.lockRelId.dbId,
  						 relation->rd_lockInfo.lockRelId.relId);
***************
*** 89,94 ****
--- 97,109 ----
  	LOCKTAG		tag;
  	LockAcquireResult res;
  
+ 	/*
+ 	 * If somebody tries to lock a relation for more than a simple SELECT,
+ 	 * unfreeze it.
+ 	 */
+ 	if (lockmode > AccessShareLock)
+ 		heap_unfreeze(relation);
+ 
  	SET_LOCKTAG_RELATION(tag,
  						 relation->rd_lockInfo.lockRelId.dbId,
  						 relation->rd_lockInfo.lockRelId.relId);
diff -rc -X diff-ignore 11fixclass/src/backend/utils/init/flatfiles.c 12ntrelminxid/src/backend/utils/init/flatfiles.c
*** 11fixclass/src/backend/utils/init/flatfiles.c	2006-05-04 21:36:12.000000000 -0400
--- 12ntrelminxid/src/backend/utils/init/flatfiles.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 163,169 ****
  /*
   * write_database_file: update the flat database file
   *
!  * A side effect is to determine the oldest database's datfrozenxid
   * so we can set or update the XID wrap limit.
   */
  static void
--- 163,169 ----
  /*
   * write_database_file: update the flat database file
   *
!  * A side effect is to determine the oldest database's datminxid
   * so we can set or update the XID wrap limit.
   */
  static void
***************
*** 177,183 ****
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	NameData	oldest_datname;
! 	TransactionId oldest_datfrozenxid = InvalidTransactionId;
  
  	/*
  	 * Create a temporary filename to be renamed later.  This prevents the
--- 177,183 ----
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	NameData	oldest_datname;
! 	TransactionId oldest_datminxid = InvalidTransactionId;
  
  	/*
  	 * Create a temporary filename to be renamed later.  This prevents the
***************
*** 208,234 ****
  		char	   *datname;
  		Oid			datoid;
  		Oid			dattablespace;
! 		TransactionId datfrozenxid,
  					datvacuumxid;
  
  		datname = NameStr(dbform->datname);
  		datoid = HeapTupleGetOid(tuple);
  		dattablespace = dbform->dattablespace;
! 		datfrozenxid = dbform->datfrozenxid;
  		datvacuumxid = dbform->datvacuumxid;
  
  		/*
! 		 * Identify the oldest datfrozenxid, ignoring databases that are not
  		 * connectable (we assume they are safely frozen).	This must match
  		 * the logic in vac_truncate_clog() in vacuum.c.
  		 */
  		if (dbform->datallowconn &&
! 			TransactionIdIsNormal(datfrozenxid))
  		{
! 			if (oldest_datfrozenxid == InvalidTransactionId ||
! 				TransactionIdPrecedes(datfrozenxid, oldest_datfrozenxid))
  			{
! 				oldest_datfrozenxid = datfrozenxid;
  				namestrcpy(&oldest_datname, datname);
  			}
  		}
--- 208,234 ----
  		char	   *datname;
  		Oid			datoid;
  		Oid			dattablespace;
! 		TransactionId datminxid,
  					datvacuumxid;
  
  		datname = NameStr(dbform->datname);
  		datoid = HeapTupleGetOid(tuple);
  		dattablespace = dbform->dattablespace;
! 		datminxid = dbform->datminxid;
  		datvacuumxid = dbform->datvacuumxid;
  
  		/*
! 		 * Identify the oldest datminxid, ignoring databases that are not
  		 * connectable (we assume they are safely frozen).	This must match
  		 * the logic in vac_truncate_clog() in vacuum.c.
  		 */
  		if (dbform->datallowconn &&
! 			TransactionIdIsNormal(datminxid))
  		{
! 			if (oldest_datminxid == InvalidTransactionId ||
! 				TransactionIdPrecedes(datminxid, oldest_datminxid))
  			{
! 				oldest_datminxid = datminxid;
  				namestrcpy(&oldest_datname, datname);
  			}
  		}
***************
*** 244,257 ****
  		}
  
  		/*
! 		 * The file format is: "dbname" oid tablespace frozenxid vacuumxid
  		 *
  		 * The xids are not needed for backend startup, but are of use to
  		 * autovacuum, and might also be helpful for forensic purposes.
  		 */
  		fputs_quote(datname, fp);
  		fprintf(fp, " %u %u %u %u\n",
! 				datoid, dattablespace, datfrozenxid, datvacuumxid);
  	}
  	heap_endscan(scan);
  
--- 244,257 ----
  		}
  
  		/*
! 		 * The file format is: "dbname" oid tablespace minxid vacuumxid
  		 *
  		 * The xids are not needed for backend startup, but are of use to
  		 * autovacuum, and might also be helpful for forensic purposes.
  		 */
  		fputs_quote(datname, fp);
  		fprintf(fp, " %u %u %u %u\n",
! 				datoid, dattablespace, datminxid, datvacuumxid);
  	}
  	heap_endscan(scan);
  
***************
*** 272,281 ****
  						tempname, filename)));
  
  	/*
! 	 * Set the transaction ID wrap limit using the oldest datfrozenxid
  	 */
! 	if (oldest_datfrozenxid != InvalidTransactionId)
! 		SetTransactionIdLimit(oldest_datfrozenxid, &oldest_datname);
  }
  
  
--- 272,281 ----
  						tempname, filename)));
  
  	/*
! 	 * Set the transaction ID wrap limit using the oldest datminxid
  	 */
! 	if (oldest_datminxid != InvalidTransactionId)
! 		SetTransactionIdLimit(oldest_datminxid, &oldest_datname);
  }
  
  
diff -rc -X diff-ignore 11fixclass/src/backend/utils/init/postinit.c 12ntrelminxid/src/backend/utils/init/postinit.c
*** 11fixclass/src/backend/utils/init/postinit.c	2006-05-04 21:36:12.000000000 -0400
--- 12ntrelminxid/src/backend/utils/init/postinit.c	2006-06-10 19:38:04.000000000 -0400
***************
*** 24,29 ****
--- 24,30 ----
  #include "catalog/pg_authid.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_tablespace.h"
+ #include "commands/dbcommands.h"
  #include "libpq/hba.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
***************
*** 193,198 ****
--- 194,210 ----
  					PGC_BACKEND, PGC_S_DEFAULT);
  
  	/*
+ 	 * If the database is marked as frozen, unfreeze it to make sure we won't
+ 	 * leave non-vacuumed tuples hidden behind a frozen pg_database entry.
+ 	 *
+ 	 * This is more paranoid than it needs to be -- if we had a way of
+ 	 * declaring a session as being guaranteed-read-only, we could skip doing
+ 	 * this for such sessions.  In the meantime, be safe.
+ 	 */
+ 	if (TransactionIdEquals(dbform->datminxid, FrozenTransactionId))
+ 		UnfreezeDatabase(MyDatabaseId, GetCurrentTransactionId());
+ 
+ 	/*
  	 * Lastly, set up any database-specific configuration variables.
  	 */
  	if (IsUnderPostmaster)
diff -rc -X diff-ignore 11fixclass/src/backend/utils/mmgr/aset.c 12ntrelminxid/src/backend/utils/mmgr/aset.c
*** 11fixclass/src/backend/utils/mmgr/aset.c	2006-03-08 16:23:32.000000000 -0300
--- 12ntrelminxid/src/backend/utils/mmgr/aset.c	2006-06-11 02:50:35.000000000 -0400
***************
*** 11,17 ****
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   * IDENTIFICATION
!  *	  $PostgreSQL: pgsql/src/backend/utils/mmgr/aset.c,v 1.66 2006-03-05 15:58:49 momjian Exp $
   *
   * NOTE:
   *	This is a new (Feb. 05, 1999) implementation of the allocation set
--- 11,17 ----
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   * IDENTIFICATION
!  *	  $PostgreSQL: pgsql/src/backend/utils/mmgr/aset.c,v 1.66 2006/03/05 15:58:49 momjian Exp $
   *
   * NOTE:
   *	This is a new (Feb. 05, 1999) implementation of the allocation set
diff -rc -X diff-ignore 11fixclass/src/bin/initdb/initdb.c 12ntrelminxid/src/bin/initdb/initdb.c
*** 11fixclass/src/bin/initdb/initdb.c	2006-06-11 17:31:15.000000000 -0400
--- 12ntrelminxid/src/bin/initdb/initdb.c	2006-06-11 17:56:25.000000000 -0400
***************
*** 184,189 ****
--- 184,190 ----
  static void setup_schema(void);
  static void vacuum_db(void);
  static void make_template0(void);
+ static void freeze_template0(void);
  static void make_postgres(void);
  static void trapsig(int signum);
  static void check_ok(void);
***************
*** 2014,2019 ****
--- 2015,2054 ----
  }
  
  /*
+  * freeze template0
+  *
+  * Note that this routine connects to template0, not template1 like all the
+  * rest.
+  */
+ static void
+ freeze_template0(void)
+ {
+ 	PG_CMD_DECL;
+ 	char	  **line;
+ 	static char *template_freeze[] = {
+ 		"VACUUM FREEZE;\n",
+ 		NULL
+ 	};
+ 
+ 	fputs(_("freezing template0 ... "), stdout);
+ 	fflush(stdout);
+ 
+ 	snprintf(cmd, sizeof(cmd),
+ 			 "\"%s\" %s template0 >%s",
+ 			 backend_exec, backend_options,
+ 			 DEVNULL);
+ 
+ 	PG_CMD_OPEN;
+ 
+ 	for (line = template_freeze; *line; line++)
+ 		PG_CMD_PUTS(*line);
+ 
+ 	PG_CMD_CLOSE;
+ 	
+ 	check_ok();
+ }
+ 
+ /*
   * copy template1 to postgres
   */
  static void
***************
*** 2953,2958 ****
--- 2988,2995 ----
  
  	make_template0();
  
+ 	freeze_template0();
+ 
  	make_postgres();
  
  	if (authwarning != NULL)
SÃ³lo en 12ntrelminxid/src/bin/initdb: .#initdb.c.1.116
diff -rc -X diff-ignore 11fixclass/src/include/access/heapam.h 12ntrelminxid/src/include/access/heapam.h
*** 11fixclass/src/include/access/heapam.h	2006-05-13 19:12:34.000000000 -0400
--- 12ntrelminxid/src/include/access/heapam.h	2006-06-10 19:38:04.000000000 -0400
***************
*** 122,127 ****
--- 122,129 ----
  
  /* heapam.c */
  
+ extern bool disable_heap_unfreeze;
+ 
  typedef enum
  {
  	LockTupleShared,
***************
*** 168,173 ****
--- 170,176 ----
  				Buffer *buffer, ItemPointer ctid,
  				TransactionId *update_xmax, CommandId cid,
  				LockTupleMode mode, bool nowait);
+ extern void heap_unfreeze(Relation rel);
  extern void heap_inplace_update(Relation relation, HeapTuple tuple);
  
  extern Oid	simple_heap_insert(Relation relation, HeapTuple tup);
diff -rc -X diff-ignore 11fixclass/src/include/access/transam.h 12ntrelminxid/src/include/access/transam.h
*** 11fixclass/src/include/access/transam.h	2006-03-08 16:23:46.000000000 -0300
--- 12ntrelminxid/src/include/access/transam.h	2006-06-10 19:38:04.000000000 -0400
***************
*** 123,129 ****
  /* in transam/varsup.c */
  extern TransactionId GetNewTransactionId(bool isSubXact);
  extern TransactionId ReadNewTransactionId(void);
! extern void SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
  					  Name oldest_datname);
  extern Oid	GetNewObjectId(void);
  
--- 123,129 ----
  /* in transam/varsup.c */
  extern TransactionId GetNewTransactionId(bool isSubXact);
  extern TransactionId ReadNewTransactionId(void);
! extern void SetTransactionIdLimit(TransactionId oldest_datminxid,
  					  Name oldest_datname);
  extern Oid	GetNewObjectId(void);
  
diff -rc -X diff-ignore 11fixclass/src/include/catalog/pg_attribute.h 12ntrelminxid/src/include/catalog/pg_attribute.h
*** 11fixclass/src/include/catalog/pg_attribute.h	2006-06-09 18:47:58.000000000 -0400
--- 12ntrelminxid/src/include/catalog/pg_attribute.h	2006-06-10 22:45:18.000000000 -0400
***************
*** 442,452 ****
   * ----------------
   */
  #define Schema_pg_ntclass \
! { 1004, {"relpages"},	   23, -1,	4,	1, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1004, {"reltuples"},	   700, -1, 4,	2, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }
  
  DATA(insert ( 1004 relpages			23 -1 4   1 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1004 reltuples	   700 -1 4   2 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1004 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  /* no OIDs in pg_ntclass */
  DATA(insert ( 1004 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
--- 442,456 ----
   * ----------------
   */
  #define Schema_pg_ntclass \
! { 1004, {"relpages"},	   23,  -1, 4, 1, 0, -1, -1, true,  'p', 'i', true, false, false, true, 0 }, \
! { 1004, {"reltuples"},	   700, -1, 4, 2, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \
! { 1004, {"relminxid"},	   28,  -1, 4, 3, 0, -1, -1, true,  'p', 'i', true, false, false, true, 0 }, \
! { 1004, {"relvacuumxid"},  28,  -1, 4, 4, 0, -1, -1, true,  'p', 'i', true, false, false, true, 0 }
  
  DATA(insert ( 1004 relpages			23 -1 4   1 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1004 reltuples	   700 -1 4   2 0 -1 -1 f p i t f f t 0));
+ DATA(insert ( 1004 relminxid		28 -1 4   3 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 relvacuumxid		28 -1 4   4 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1004 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  /* no OIDs in pg_ntclass */
  DATA(insert ( 1004 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
diff -rc -X diff-ignore 11fixclass/src/include/catalog/pg_class.h 12ntrelminxid/src/include/catalog/pg_class.h
*** 11fixclass/src/include/catalog/pg_class.h	2006-06-10 18:10:25.000000000 -0400
--- 12ntrelminxid/src/include/catalog/pg_class.h	2006-06-11 01:42:29.000000000 -0400
***************
*** 149,155 ****
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 "(0,4)" f f r 24 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1004 (  pg_ntclass	PGNSP 86 PGUID 0 1004 0 0 0 "(0,5)" f f n  2 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
--- 149,155 ----
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 "(0,4)" f f r 24 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1004 (  pg_ntclass	PGNSP 86 PGUID 0 1004 0 0 0 "(0,5)" f f n  4 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
diff -rc -X diff-ignore 11fixclass/src/include/catalog/pg_database.h 12ntrelminxid/src/include/catalog/pg_database.h
*** 11fixclass/src/include/catalog/pg_database.h	2006-03-08 16:23:46.000000000 -0300
--- 12ntrelminxid/src/include/catalog/pg_database.h	2006-06-10 22:39:48.000000000 -0400
***************
*** 43,49 ****
  	int4		datconnlimit;	/* max connections allowed (-1=no limit) */
  	Oid			datlastsysoid;	/* highest OID to consider a system OID */
  	TransactionId datvacuumxid; /* all XIDs before this are vacuumed */
! 	TransactionId datfrozenxid; /* all XIDs before this are frozen */
  	Oid			dattablespace;	/* default table space for this DB */
  	text		datconfig[1];	/* database-specific GUC (VAR LENGTH) */
  	aclitem		datacl[1];		/* access permissions (VAR LENGTH) */
--- 43,49 ----
  	int4		datconnlimit;	/* max connections allowed (-1=no limit) */
  	Oid			datlastsysoid;	/* highest OID to consider a system OID */
  	TransactionId datvacuumxid; /* all XIDs before this are vacuumed */
! 	TransactionId datminxid;	/* minimum XID present anywhere in the DB */
  	Oid			dattablespace;	/* default table space for this DB */
  	text		datconfig[1];	/* database-specific GUC (VAR LENGTH) */
  	aclitem		datacl[1];		/* access permissions (VAR LENGTH) */
***************
*** 69,75 ****
  #define Anum_pg_database_datconnlimit	6
  #define Anum_pg_database_datlastsysoid	7
  #define Anum_pg_database_datvacuumxid	8
! #define Anum_pg_database_datfrozenxid	9
  #define Anum_pg_database_dattablespace	10
  #define Anum_pg_database_datconfig		11
  #define Anum_pg_database_datacl			12
--- 69,75 ----
  #define Anum_pg_database_datconnlimit	6
  #define Anum_pg_database_datlastsysoid	7
  #define Anum_pg_database_datvacuumxid	8
! #define Anum_pg_database_datminxid		9
  #define Anum_pg_database_dattablespace	10
  #define Anum_pg_database_datconfig		11
  #define Anum_pg_database_datacl			12
diff -rc -X diff-ignore 11fixclass/src/include/catalog/pg_ntclass.h 12ntrelminxid/src/include/catalog/pg_ntclass.h
*** 11fixclass/src/include/catalog/pg_ntclass.h	2006-06-10 22:21:01.000000000 -0400
--- 12ntrelminxid/src/include/catalog/pg_ntclass.h	2006-06-11 01:33:58.000000000 -0400
***************
*** 38,48 ****
  {
  	int4		relpages;		/* # of blocks (not always up-to-date) */
  	float4		reltuples;		/* # of tuples (not always up-to-date) */
  } FormData_pg_ntclass;
  
  /* Size of pg_ntclass tuples */
  #define NTCLASS_TUPLE_SIZE \
! 	 (offsetof(FormData_pg_ntclass,reltuples) + sizeof(float4))
  /* ----------------
   *		Form_pg_ntclass corresponds to a pointer to a tuple with
   *		the format of pg_ntclass relation.
--- 38,50 ----
  {
  	int4		relpages;		/* # of blocks (not always up-to-date) */
  	float4		reltuples;		/* # of tuples (not always up-to-date) */
+ 	TransactionId relminxid;	/* minimum Xid present in table */
+ 	TransactionId relvacuumxid;	/* cutoff point used at last VACUUM */
  } FormData_pg_ntclass;
  
  /* Size of pg_ntclass tuples */
  #define NTCLASS_TUPLE_SIZE \
! 	 (offsetof(FormData_pg_ntclass,relvacuumxid) + sizeof(TransactionId))
  /* ----------------
   *		Form_pg_ntclass corresponds to a pointer to a tuple with
   *		the format of pg_ntclass relation.
***************
*** 54,62 ****
   *		compiler constants for pg_ntclass
   * ----------------
   */
! #define Natts_pg_ntclass				2
  #define Anum_pg_ntclass_relpages		1
  #define Anum_pg_ntclass_reltuples		2
  
  /* ----------------
   *		initial contents of pg_ntclass
--- 56,66 ----
   *		compiler constants for pg_ntclass
   * ----------------
   */
! #define Natts_pg_ntclass				4
  #define Anum_pg_ntclass_relpages		1
  #define Anum_pg_ntclass_reltuples		2
+ #define Anum_pg_ntclass_relminxid		3
+ #define Anum_pg_ntclass_relvacuumxid	4
  
  /* ----------------
   *		initial contents of pg_ntclass
***************
*** 66,75 ****
   * ----------------
   */
  
! DATA(insert ( 0 0 ));
! DATA(insert ( 0 0 ));
! DATA(insert ( 0 0 ));
! DATA(insert ( 0 0 ));
! DATA(insert ( 0 0 ));
  
  #endif   /* PG_NTCLASS_H */
--- 70,79 ----
   * ----------------
   */
  
! DATA(insert ( 0 0 0 0 ));
! DATA(insert ( 0 0 0 0 ));
! DATA(insert ( 0 0 0 0 ));
! DATA(insert ( 0 0 0 0 ));
! DATA(insert ( 0 0 0 0 ));
  
  #endif   /* PG_NTCLASS_H */
diff -rc -X diff-ignore 11fixclass/src/include/commands/dbcommands.h 12ntrelminxid/src/include/commands/dbcommands.h
*** 11fixclass/src/include/commands/dbcommands.h	2006-05-04 21:36:12.000000000 -0400
--- 12ntrelminxid/src/include/commands/dbcommands.h	2006-06-10 19:38:04.000000000 -0400
***************
*** 58,63 ****
--- 58,64 ----
  extern void AlterDatabase(AlterDatabaseStmt *stmt);
  extern void AlterDatabaseSet(AlterDatabaseSetStmt *stmt);
  extern void AlterDatabaseOwner(const char *dbname, Oid newOwnerId);
+ extern void UnfreezeDatabase(Oid dbid, TransactionId unfreezeXid);
  
  extern Oid	get_database_oid(const char *dbname);
  extern char *get_database_name(Oid dbid);
diff -rc -X diff-ignore 11fixclass/src/include/commands/vacuum.h 12ntrelminxid/src/include/commands/vacuum.h
*** 11fixclass/src/include/commands/vacuum.h	2006-06-04 22:36:52.000000000 -0400
--- 12ntrelminxid/src/include/commands/vacuum.h	2006-06-10 20:22:17.000000000 -0400
***************
*** 114,123 ****
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Relation rel,
! 					BlockNumber num_pages,
! 					double num_tuples,
! 					bool hasindex);
  extern void vacuum_set_xid_limits(VacuumStmt *vacstmt, bool sharedRel,
  					  TransactionId *oldestXmin,
  					  TransactionId *freezeLimit);
--- 114,122 ----
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Relation rel, BlockNumber num_pages,
! 					double num_tuples, bool hasindex,
! 					TransactionId minxid, TransactionId vacuumxid);
  extern void vacuum_set_xid_limits(VacuumStmt *vacstmt, bool sharedRel,
  					  TransactionId *oldestXmin,
  					  TransactionId *freezeLimit);
diff -rc -X diff-ignore 11fixclass/src/include/libpq/hba.h 12ntrelminxid/src/include/libpq/hba.h
*** 11fixclass/src/include/libpq/hba.h	2006-03-08 16:23:47.000000000 -0300
--- 12ntrelminxid/src/include/libpq/hba.h	2006-06-10 19:38:04.000000000 -0400
***************
*** 40,46 ****
  extern int	hba_getauthmethod(hbaPort *port);
  extern int	authident(hbaPort *port);
  extern bool read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbfrozenxid,
  					  TransactionId *dbvacuumxid);
  
  #endif   /* HBA_H */
--- 40,46 ----
  extern int	hba_getauthmethod(hbaPort *port);
  extern int	authident(hbaPort *port);
  extern bool read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbminxid,
  					  TransactionId *dbvacuumxid);
  
  #endif   /* HBA_H */

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#2)

Re: Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@commandprompt.com> writes:

I forgot to attach the new file pg_ntclass.h (src/include/catalog).
Here it is.

Couple thoughts about this:

* I still suggest calling it pg_class_nt not pg_ntclass; that naming
convention seems like it will scale better if there are more
nontransactional "appendage" relations. I'm surprised you didn't
already need to invent pg_database_nt, for instance ... don't
datvacuumxid and datfrozenxid need to be nontransactional?

* The DATA() entries for the bootstrapped relations ought to be
commented as to which rels they belong to (corresponding to the
hardwired TIDs in pg-class.h):

DATA(insert ( 0 0 )); /* pg_type */
etc

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#1)

Re: Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

VACUUM FULL also refuses to operate on these tables, and ANALYZE
silently skips them. Only plain VACUUM cleans them.

I wonder whether VACUUM FULL applied to an NT table shouldn't just act
like plain VACUUM instead. Probably not very important though.

Note that you can DELETE from pg_ntclass. Not sure if we should
disallow it somehow, because it's not easy to get out from that if you
do.

No worse than DELETE FROM pg_class ;-). Please verify that the
aclcheck prohibitions on changing system catalogs are properly applied
to these catalogs too.

There is one caveat that I'm worried about. I had to add a "typedef" to
pg_class.h to put ItemPointerData in FormData_pg_class, because the C
struct doesn't recognize the "tid" type, but the bootstrap type system
does not recognize ItemPointerData as a valid type. I find this mighty
ugly because it will have side effects whenever we #include pg_class.h
(which is virtually anywhere, since that header is #included in htup.h
which in turn is included almost everywhere). Suggestions welcome.
Maybe this is not a problem.

Would it work to do

#define tid ItemPointerData
...
tid relntrans;
...
#undef tid
?

I'm not sure whether this will cause the right things to appear in the
.bki file, but if it does then the #undef would limit the scope of the
"tid" name.

In any case, the only thing uglier than a hack is an uncommented hack
;-) ... the typedef or macro needs to have a comments explaining what
and why.

The *real* problem with what you've done is that pg_class.h now depends
on having ItemPointerData defined before it's included, which creates an
inclusion ordering dependency that should not exist. If you stick with
either of these approaches then pg_class.h needs to #include whatever
defines ItemPointerData.

I notice that postgres.h defines a typedef for aclitem to work around a
similar issue. Is it reasonable to move ItemPointerData into postgres.h
so we could put the tid typedef beside the aclitem one?

Other two caveats are:
1. During bootstrap, RelationBuildLocalRelation creates nailed relations
with hardcoded TID=(0,1). This is because we don't have access to
pg_class yet, so we can't find the real pointer; and furthermore, we are
going to fix the entries later in the bootstrapping process.

This seems dangerous; can't you set it to InvalidItemPointer instead?
If it's not used before fixed, this doesn't matter, and if someone
*does* try to use it, that will catch the problem.

2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty
ugly as well; and autovacuum is skipping pg_ntclass (really all
non-transactional catalogs) altogether. We could improve the situation
by introducing some sort of struct like {relid, relkind}, so that
vacuum_rel could know what relkind to expect, and it could skip
non-transactional catalogs cleanly in vacuum full and analyze.

Need to do something about this. pg_ntclass will contain XIDs (of
inserting/deleting transactions) so it MUST get vacuumed to be sure
we don't expose ourselves to XID wrap problems.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#3)

Re: The corresponding relminxid patch; try 1

Alvaro Herrera <alvherre@commandprompt.com> writes:

This is the relminxid patch corresponding to the pg_ntclass patch I just
posted.

That disable_heap_unfreeze thing seriously sucks. How bad are the API
changes needed to pass that as a parameter instead of having a
very-dangerous global variable?

The comment at line 328ff in dbcommands.c seems misguided, which makes
me doubt the code too. datfrozenxid and datvacuumxid should be
considered as indicating what XIDs appear inside the database, not what
is in its pg_database row.

The changes in vacuum.c are far too extensive to review meaningfully.
What did you do, and did it really need to touch so much code?

The thing that bothers me most about this is that it turns LockRelation
into an operation that needs to heap_fetch() from pg_ntclass in some
cases, and possibly update it.

Have you done any profiling to see what that actually costs?

I think we could possibly dodge the work in the normal case if we are
willing to make VACUUM FREEZE take ExclusiveLock and send out a relation
cache inval on the relation. Then, we can cache the pg_ntclass tuple in
relcache entries (discarding it on cache inval), and if the cached value
says it's not frozen then it's not frozen. You couldn't trust the
cached value much further than that, but that would at least take the
fetch out of the normal path in LockRelation. The trick here is the
problem that if VACUUM FREEZE fails after modifying pg_ntclass, its
relcache inval won't be sent out.

A bigger issue here is that I'm not sure what the locking protocol is
for pg_ntclass itself. It looks like you're not consistently taking
a RowExclusiveLock when you update it.

BTW, I think you have the order of operations wrong in LockRelation;
should it not unfreeze only *after* obtaining lock? Consider race
condition against relation drop for instance.

regards, tom lane

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#6)

Re: The corresponding relminxid patch; try 1

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

This is the relminxid patch corresponding to the pg_ntclass patch I just
posted.

That disable_heap_unfreeze thing seriously sucks. How bad are the API
changes needed to pass that as a parameter instead of having a
very-dangerous global variable?

Let's see -- I would need to fix all callers of LockRelation, and the
problem I found in an earlier version of the patch (before the invention
of the non-transaction stuff) was that some callers needed to pass that
information several levels down. It's possible that this was an
artifact of the fact that it was using the relcache. I'll experiment
with changing stuff so that the global variable is not needed.

The comment at line 328ff in dbcommands.c seems misguided, which makes
me doubt the code too. datfrozenxid and datvacuumxid should be
considered as indicating what XIDs appear inside the database, not what
is in its pg_database row.

No, actually it's correct. The point of that comment is that if the
source database is frozen, then all XIDs appearing inside both databases
(source and newly created) are frozen. So it's possible that the row in
pg_database is frozen as well. But because we are creating a new row in
pg_database, it's not really frozen any longer; so we change the
pg_database fields in the new row to match.

Actually, pg_database is going to be unfrozen right after that code,
because it's opened with RowExclusiveLock shortly after, precisely to
insert that new row we are inserting. So maybe this is not an issue.

The changes in vacuum.c are far too extensive to review meaningfully.
What did you do, and did it really need to touch so much code?

Yeah, they are extensive. I did several things there: get rid of a
couple of global variables that no longer need to be global; remove the
return value from vacuum_rel, since it's no longer needed (it's used to
determine whether we can truncate pg_clog, but now we can do it
regardless of whether this particular vacuuming took place or not); I
changed some variables from the old "frozenXid" name to "minXid"; I put
in a hack to make VACUUM FREEZE take a stronger lock; changed the API of
vacuum_rel so that instead of taking a specific acceptable relkind, it
receives whether TOAST is acceptable or not; and added the code needed
to keep track of the earliest Xid found in code. But by far the most
extensive change is the melding of vac_update_dbstats into
vac_update_dbminxid, and the update of vac_update_relstats to cope with
pg_ntclass.

Maybe I should take a stab at making incremental patches instead of
doing everything in one patch. This way it would be easier to review
for correctness (and I'd be more confident that it is actually correct
as well).

The thing that bothers me most about this is that it turns LockRelation
into an operation that needs to heap_fetch() from pg_ntclass in some
cases, and possibly update it.

Have you done any profiling to see what that actually costs?

No, but I guess it must be expensive. While relminxid was still in the
relcache, it was cheap because we checked the value before having to
actually do anything else. That's why I was suggesting having a
separate cache for non-transactional stuff.

I think we could possibly dodge the work in the normal case if we are
willing to make VACUUM FREEZE take ExclusiveLock and send out a relation
cache inval on the relation.

Well, one problem is that if enough transactions pass after the last
update to a table, a normal VACUUM (i.e. not FREEZE) could mark a table
as frozen as well; marking frozen is not an exclusive property of VACUUM
FREEZE.

BTW, I think you have the order of operations wrong in LockRelation;
should it not unfreeze only *after* obtaining lock? Consider race
condition against relation drop for instance.

Hmm, good point. I think it was OK (and actually, it was required)
while relminxid was still on pg_class; or rather, there was a race
condition the other way around, so it was required to unfreeze the table
_before_ obtaining the lock. But it's certainly wrong now.

I'll work on pg_class_nt and I'll be back to this soon. Thanks for the
review.

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#7)

Re: The corresponding relminxid patch; try 1

Alvaro Herrera <alvherre@commandprompt.com> writes:

No, actually it's correct. The point of that comment is that if the
source database is frozen, then all XIDs appearing inside both databases
(source and newly created) are frozen. So it's possible that the row in
pg_database is frozen as well. But because we are creating a new row in
pg_database, it's not really frozen any longer; so we change the
pg_database fields in the new row to match.

No, this only says that pg_database has to be unfrozen. If the source
DB is frozen then the clone is frozen too.

The changes in vacuum.c are far too extensive to review meaningfully.
What did you do, and did it really need to touch so much code?

Yeah, they are extensive. ...

Maybe I should take a stab at making incremental patches instead of
doing everything in one patch. This way it would be easier to review
for correctness (and I'd be more confident that it is actually correct
as well).

Please. I've got no confidence that I see what's going on there.

regards, tom lane

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#5)

Re: Non-transactional pg_class, try 2

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Would it work to do

#define tid ItemPointerData
...
tid relntrans;
...
#undef tid
?

Yeah, it probably would. I'll try.

The *real* problem with what you've done is that pg_class.h now depends
on having ItemPointerData defined before it's included, which creates an
inclusion ordering dependency that should not exist. If you stick with
either of these approaches then pg_class.h needs to #include whatever
defines ItemPointerData.

storage/itemptr.h is #included in pg_class.h (first chunk of the patch).

Other two caveats are:
1. During bootstrap, RelationBuildLocalRelation creates nailed relations
with hardcoded TID=(0,1). This is because we don't have access to
pg_class yet, so we can't find the real pointer; and furthermore, we are
going to fix the entries later in the bootstrapping process.

This seems dangerous; can't you set it to InvalidItemPointer instead?
If it's not used before fixed, this doesn't matter, and if someone
*does* try to use it, that will catch the problem.

Doesn't work because the bootstrap system actually _writes_ there :-( A
workaround could be to disable writing in bootstrapping mode, and store
InvalidItemPointer. (Actually storing InvalidItemPointer was the first
thing I did, but it crashed on bootstrap.)

I wasn't worried about bootstrap writing invalid values, because the
correct values are written shortly after (at the latest, when vacuum is
run by initdb). And if I set it to Invalid and have bootstrap mode skip
writing, exactly the same thing will happen ...

2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty
ugly as well; and autovacuum is skipping pg_ntclass (really all
non-transactional catalogs) altogether. We could improve the situation
by introducing some sort of struct like {relid, relkind}, so that
vacuum_rel could know what relkind to expect, and it could skip
non-transactional catalogs cleanly in vacuum full and analyze.

Need to do something about this. pg_ntclass will contain XIDs (of
inserting/deleting transactions) so it MUST get vacuumed to be sure
we don't expose ourselves to XID wrap problems.

Oh, certainly it does get vacuumed. vacuum.c is modified so that
non-transactional catalogs are vacuumed (in database-wide VACUUM for
instance). The only thing I was stating above was that the way vacuum.c
handles the list of relations, is a bit of a mess, because vacuum_rel
wants to check the relkind but get_oids_list forms only a single list
and it's assumed that they are all RELKIND_RELATION rels. I had to
modify it a bit so that NON_TRANSACTIONAL rels are also included in that
list, and therefore the check had to be relaxed.

I also made ANALYZE silently skip non-transactional catalogs, in an
similarly ugly way. I don't remember what was the rationale for this --
certainly there isn't any harm. But on the other hand, analyzing it
serves no purpose since the statistics are not used for anything.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#9)

Re: Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Other two caveats are:
1. During bootstrap, RelationBuildLocalRelation creates nailed relations
with hardcoded TID=(0,1).

This seems dangerous; can't you set it to InvalidItemPointer instead?
If it's not used before fixed, this doesn't matter, and if someone
*does* try to use it, that will catch the problem.

Doesn't work because the bootstrap system actually _writes_ there :-( A
workaround could be to disable writing in bootstrapping mode, and store
InvalidItemPointer. (Actually storing InvalidItemPointer was the first
thing I did, but it crashed on bootstrap.)

Or, set it to (0,1) and reserve that TID as a dummy entry. What I'm
afraid of here is scribbling on some other relation's entry. I'd like
to see some defense against that, don't much care what.

We do plenty of disable-this-in-bootstrap-mode checks, so one more
doesn't seem like a problem; so the first solution may be better.

regards, tom lane

#11

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#10)

1 attachment(s)

Re: Non-transactional pg_class, try 2

Tom Lane wrote:

Or, set it to (0,1) and reserve that TID as a dummy entry. What I'm
afraid of here is scribbling on some other relation's entry. I'd like
to see some defense against that, don't much care what.

We do plenty of disable-this-in-bootstrap-mode checks, so one more
doesn't seem like a problem; so the first solution may be better.

New version of the patch, including fixes to all the feedback you
provided. Thanks!

I used a dummy entry in (0,1), which seems cleaner to me (the
index-creation stuff in bootstrap is apparently still needed to generate
sinval messages, so it's not as easy as returning early from the
function). Maybe we could include a step in initdb to get rid of it,
but it doesn't seem too much of an issue.

Attachments:

fixclass-4.patchtext/plain; charset=us-asciiDownload

Index: doc/src/sgml/catalogs.sgml
===================================================================
RCS file: /home/alvherre/cvs/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.124
diff -c -r2.124 catalogs.sgml
*** doc/src/sgml/catalogs.sgml	3 Jun 2006 02:53:04 -0000	2.124
--- doc/src/sgml/catalogs.sgml	12 Jun 2006 03:16:08 -0000
***************
*** 104,109 ****
--- 104,114 ----
       </row>
  
       <row>
+       <entry><link linkend="catalog-pg-class-nt"><structname>pg_class_nt</structname></link></entry>
+       <entry>non-transactional columns for <structname>pg_class</structname></entry>
+      </row>
+ 
+      <row>
        <entry><link linkend="catalog-pg-constraint"><structname>pg_constraint</structname></link></entry>
        <entry>check constraints, unique constraints, primary key constraints, foreign key constraints</entry>
       </row>
***************
*** 1465,1497 ****
       </row>
  
       <row>
-       <entry><structfield>relpages</structfield></entry>
-       <entry><type>int4</type></entry>
-       <entry></entry>
-       <entry>
-        Size of the on-disk representation of this table in pages (of size
-        <symbol>BLCKSZ</symbol>).
-        This is only an estimate used by the planner.
-        It is updated by <command>VACUUM</command>,
-        <command>ANALYZE</command>, and a few DDL commands
-        such as <command>CREATE INDEX</command>.
-       </entry>
-      </row>
- 
-      <row>
-       <entry><structfield>reltuples</structfield></entry>
-       <entry><type>float4</type></entry>
-       <entry></entry>
-       <entry>
-        Number of rows in the table.
-        This is only an estimate used by the planner.
-        It is updated by <command>VACUUM</command>,
-        <command>ANALYZE</command>, and a few DDL commands
-        such as <command>CREATE INDEX</command>.
-       </entry>
-      </row>
- 
-      <row>
        <entry><structfield>reltoastrelid</structfield></entry>
        <entry><type>oid</type></entry>
        <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
--- 1470,1475 ----
***************
*** 1512,1517 ****
--- 1490,1504 ----
       </row>
  
       <row>
+       <entry><structfield>relntrans</structfield></entry>
+       <entry><type>tid</type></entry>
+       <entry><literal><link linkend="catalog-pg-class-nt"><structname>pg_class_nt</structname></link>.ctid</literal></entry>
+       <entry>
+        CTID of this relation's pg_class_nt row.
+       </entry>
+      </row>
+ 
+      <row>
        <entry><structfield>relhasindex</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
***************
*** 1538,1547 ****
        <entry><type>char</type></entry>
        <entry></entry>
        <entry>
!        <literal>r</> = ordinary table, <literal>i</> = index,
!        <literal>S</> = sequence, <literal>v</> = view, <literal>c</> =
!        composite type, <literal>t</> = TOAST
!        table
        </entry>
       </row>
  
--- 1525,1533 ----
        <entry><type>char</type></entry>
        <entry></entry>
        <entry>
!        <literal>r</> = ordinary table, <literal>i</> = index, <literal>S</> =
!        sequence, <literal>v</> = view, <literal>c</> = composite type,
!        <literal>t</> = TOAST table, <literal>n</> = non-transactional catalog
        </entry>
       </row>
  
***************
*** 1648,1653 ****
--- 1634,1695 ----
    </table>
   </sect1>
  
+  <sect1 id="catalog-pg-class-nt">
+   <title><structname>pg_class_nt</structname></title>
+ 
+   <indexterm zone="catalog-pg-class-nt">
+    <primary>pg_class_nt</primary>
+   </indexterm>
+ 
+   <para>
+    The catalog <structname>pg_class_nt</structname> stores the non-transactional
+    information about <structname>pg_class</structname> entries, that is, the
+    information that is independent of the commit status of any particular
+    transaction.
+   </para>
+ 
+   <table>
+    <title><structname>pg_class_nt</> Columns</title>
+ 
+    <tgroup cols=4>
+     <thead>
+      <row>
+       <entry>Name</entry>
+       <entry>Type</entry>
+       <entry>References</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+ 
+     <tbody>
+      <row>
+       <entry><structfield>npages</structfield></entry>
+       <entry><type>int4</type></entry>
+       <entry></entry>
+       <entry>
+        Size of the on-disk representation of this table in pages (of size
+        <symbol>BLCKSZ</symbol>).  This is only an estimate used by the planner.
+        It is updated by <command>VACUUM</command>, <command>ANALYZE</command>,
+        and a few DDL commands such as <command>CREATE INDEX</command>.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>ntuples</structfield></entry>
+       <entry><type>float4</type></entry>
+       <entry></entry>
+       <entry>
+        Number of rows in the table.  This is only an estimate used by the
+        planner.  It is updated by <command>VACUUM</command>,
+        <command>ANALYZE</command>, and a few DDL commands such as
+        <command>CREATE INDEX</command>.
+       </entry>
+      </row>
+     </tbody>
+   </tgroup>
+  </sect1>
+ 
+ 
   <sect1 id="catalog-pg-constraint">
    <title><structname>pg_constraint</structname></title>
  
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.213
diff -c -r1.213 heapam.c
*** src/backend/access/heap/heapam.c	28 May 2006 02:27:08 -0000	1.213
--- src/backend/access/heap/heapam.c	10 Jun 2006 23:17:47 -0000
***************
*** 1828,1833 ****
--- 1828,1839 ----
  
  	Assert(ItemPointerIsValid(otid));
  
+ 	if (relation->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("can't update non-transactional catalog %s",
+ 					   RelationGetRelationName(relation))));
+ 
  	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(otid));
  	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
  
Index: src/backend/catalog/Makefile
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/Makefile,v
retrieving revision 1.59
diff -c -r1.59 Makefile
*** src/backend/catalog/Makefile	12 Feb 2006 03:22:17 -0000	1.59
--- src/backend/catalog/Makefile	12 Jun 2006 16:39:21 -0000
***************
*** 24,33 ****
  
  # Note: there are some undocumented dependencies on the ordering in which
  # the catalog header files are assembled into postgres.bki.  In particular,
! # indexing.h had better be last.
  
  POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
! 	pg_proc.h pg_type.h pg_attribute.h pg_class.h pg_autovacuum.h \
  	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h \
  	pg_operator.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
  	pg_language.h pg_largeobject.h pg_aggregate.h pg_statistic.h \
--- 24,34 ----
  
  # Note: there are some undocumented dependencies on the ordering in which
  # the catalog header files are assembled into postgres.bki.  In particular,
! # indexing.h had better be last, and the bootstrapped catalogs must appear in
! # front.
  
  POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
! 	pg_proc.h pg_type.h pg_attribute.h pg_class.h pg_class_nt.h pg_autovacuum.h \
  	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h \
  	pg_operator.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
  	pg_language.h pg_largeobject.h pg_aggregate.h pg_statistic.h \
Index: src/backend/catalog/heap.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/heap.c,v
retrieving revision 1.299
diff -c -r1.299 heap.c
*** src/backend/catalog/heap.c	10 May 2006 23:18:39 -0000	1.299
--- src/backend/catalog/heap.c	12 Jun 2006 03:56:23 -0000
***************
*** 37,42 ****
--- 37,43 ----
  #include "catalog/index.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_attrdef.h"
+ #include "catalog/pg_class_nt.h"
  #include "catalog/pg_constraint.h"
  #include "catalog/pg_inherits.h"
  #include "catalog/pg_namespace.h"
***************
*** 66,71 ****
--- 67,73 ----
  					Relation new_rel_desc,
  					Oid new_rel_oid, Oid new_type_oid,
  					Oid relowner,
+ 					ItemPointerData nttid,
  					char relkind);
  static Oid AddNewRelationType(const char *typeName,
  				   Oid typeNamespace,
***************
*** 557,562 ****
--- 559,565 ----
  					Oid new_rel_oid,
  					Oid new_type_oid,
  					Oid relowner,
+ 					ItemPointerData nttid,
  					char relkind)
  {
  	Form_pg_class new_rel_reltup;
***************
*** 568,597 ****
  	 */
  	new_rel_reltup = new_rel_desc->rd_rel;
  
- 	switch (relkind)
- 	{
- 		case RELKIND_RELATION:
- 		case RELKIND_INDEX:
- 		case RELKIND_TOASTVALUE:
- 			/* The relation is real, but as yet empty */
- 			new_rel_reltup->relpages = 0;
- 			new_rel_reltup->reltuples = 0;
- 			break;
- 		case RELKIND_SEQUENCE:
- 			/* Sequences always have a known size */
- 			new_rel_reltup->relpages = 1;
- 			new_rel_reltup->reltuples = 1;
- 			break;
- 		default:
- 			/* Views, etc, have no disk storage */
- 			new_rel_reltup->relpages = 0;
- 			new_rel_reltup->reltuples = 0;
- 			break;
- 	}
- 
  	new_rel_reltup->relowner = relowner;
  	new_rel_reltup->reltype = new_type_oid;
  	new_rel_reltup->relkind = relkind;
  
  	new_rel_desc->rd_att->tdtypeid = new_type_oid;
  
--- 571,580 ----
  	 */
  	new_rel_reltup = new_rel_desc->rd_rel;
  
  	new_rel_reltup->relowner = relowner;
  	new_rel_reltup->reltype = new_type_oid;
  	new_rel_reltup->relkind = relkind;
+ 	new_rel_reltup->relntrans = nttid;
  
  	new_rel_desc->rd_att->tdtypeid = new_type_oid;
  
***************
*** 618,623 ****
--- 601,684 ----
  	heap_freetuple(tup);
  }
  
+ /*
+  * AddNewNtClassTuple
+  *
+  * Insert a pg_class_nt tuple for the given relation, and return its
+  * TID.
+  */
+ ItemPointerData
+ AddNewNtClassTuple(Relation new_rel_desc, char relkind)
+ {
+ 	ItemPointerData	tid;
+ 
+ 	/*
+ 	 * We don't create pg_class_nt entries for sequences, views or composite
+ 	 * types, because they don't have variable-size storage and thus they don't
+ 	 * need it.
+ 	 */
+ 	if (relkind == RELKIND_VIEW ||
+ 		relkind == RELKIND_COMPOSITE_TYPE ||
+ 		relkind == RELKIND_SEQUENCE)
+ 	{
+ 		ItemPointerSetInvalid(&tid);
+ 	}
+ 	else
+ 	{
+ 		Relation	ntRel;
+ 		Form_pg_class_nt new_rel_nttup;
+ 		HeapTuple	tup;
+ 
+ 		ntRel = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 		new_rel_nttup = (Form_pg_class_nt) palloc(sizeof(FormData_pg_class_nt));
+ 
+ 		/* choose the appropiate new values we want to insert */
+ 		switch (relkind)
+ 		{
+ 			case RELKIND_RELATION:
+ 			case RELKIND_INDEX:
+ 			case RELKIND_TOASTVALUE:
+ 			case RELKIND_NON_TRANSACTIONAL:
+ 				/* The relation is real, but as yet empty */
+ 				new_rel_nttup->npages = 0;
+ 				new_rel_nttup->ntuples = 0;
+ 				break;
+ 			default:
+ 				elog(ERROR, "can't create pg_class_nt entries for relkind '%c'",
+ 					 relkind);
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * now form a tuple to add to pg_class_nt
+ 		 */
+ 		tup = heap_addheader(Natts_pg_class_nt,
+ 							 false,
+ 							 NTCLASS_TUPLE_SIZE,
+ 							 (void *) new_rel_nttup);
+ 
+ 		/*
+ 		 * finally insert the new tuple, update the indexes, and clean up.
+ 		 */
+ 		simple_heap_insert(ntRel, tup);
+ 
+ 		/* Grab the newly inserted tuple's TID for saving it into pg_class */
+ 		tid = tup->t_self;
+ 
+ #ifdef NOT_USED
+ 		/* pg_class_nt does not have indexes */
+ 		CatalogUpdateIndexes(ntRel, tup);
+ #endif
+ 
+ 		heap_close(ntRel, RowExclusiveLock);
+ 
+ 		heap_freetuple(tup);
+ 		pfree(new_rel_nttup);
+ 	}
+ 
+ 	return tid;
+ }
  
  /* --------------------------------
   *		AddNewRelationType -
***************
*** 679,684 ****
--- 740,747 ----
  	Relation	pg_class_desc;
  	Relation	new_rel_desc;
  	Oid			new_type_oid;
+ 	ItemPointerData nttid;
+ 
  
  	pg_class_desc = heap_open(RelationRelationId, RowExclusiveLock);
  
***************
*** 733,738 ****
--- 796,806 ----
  									  relkind);
  
  	/*
+ 	 * Create a pg_class_nt entry for the relation.
+ 	 */
+ 	nttid = AddNewNtClassTuple(new_rel_desc, relkind);
+ 
+ 	/*
  	 * now create an entry in pg_class for the relation.
  	 *
  	 * NOTE: we could get a unique-index failure here, in case someone else is
***************
*** 744,749 ****
--- 812,818 ----
  						relid,
  						new_type_oid,
  						ownerid,
+ 						nttid,
  						relkind);
  
  	/*
***************
*** 872,877 ****
--- 941,962 ----
  	heap_close(pg_class_desc, RowExclusiveLock);
  }
  
+ void
+ DeleteNtClassTuple(ItemPointer tid)
+ {
+ 	Relation	pg_class_nt;
+ 
+ 	/* some relations don't have pg_class_nt tuples */
+ 	if (!ItemPointerIsValid(tid))
+ 		return;
+ 
+ 	pg_class_nt = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	simple_heap_delete(pg_class_nt, tid);
+ 
+ 	heap_close(pg_class_nt, RowExclusiveLock);
+ }
+ 
  /*
   *		DeleteAttributeTuples
   *
***************
*** 1159,1164 ****
--- 1244,1252 ----
  		smgrscheduleunlink(rel->rd_smgr, rel->rd_istemp);
  	}
  
+ 	/* Delete the pg_class_nt tuple */
+ 	DeleteNtClassTuple(&(rel->rd_rel->relntrans));
+ 
  	/*
  	 * Close relcache entry, but *keep* AccessExclusiveLock on the relation
  	 * until transaction commit.  This ensures no one else will try to do
***************
*** 2133,2135 ****
--- 2221,2293 ----
  
  	return result;
  }
+ 
+ /*
+  * Return the number of tuples and pages for a given relation, as recorded in
+  * its pg_class_nt entry.  ntuples and npages are output parameters, and can be
+  * NULL if the caller does not need them.
+  */
+ void
+ relation_get_ntups_npages(Relation relation, float4 *ntuples, int4 *npages)
+ {
+ 	HeapTuple	tup;
+ 	Relation	pg_class_nt;
+ 	Form_pg_class_nt ntForm;
+ 
+ 	pg_class_nt = heap_open(NtClassRelationId, AccessShareLock);
+ 
+ 	tup = get_class_nt_entry(pg_class_nt, relation);
+ 	ntForm = (Form_pg_class_nt) GETSTRUCT(tup);
+ 
+ 	if (ntuples != NULL)
+ 		*ntuples = ntForm->ntuples;
+ 	if (npages != NULL)
+ 		*npages = ntForm->npages;
+ 
+ 	heap_freetuple(tup);
+ 	heap_close(pg_class_nt, AccessShareLock);
+ }
+ 
+ /*
+  * get_class_nt_entry
+  *		Return a copy of the pg_class_nt tuple for the given target relation.
+  *
+  * The first argument is the pg_class_nt relation, already open and suitably
+  * locked.  If the caller is going to call update_class_nt_entry later, it
+  * must be locked with at least RowExclusiveLock.
+  */
+ HeapTuple
+ get_class_nt_entry(Relation pg_class_nt, Relation target)
+ {
+ 	HeapTupleData tuple;
+ 	HeapTuple	retval;
+ 	Buffer		buffer;
+ 
+ 	Assert(ItemPointerIsValid(&(target->rd_rel->relntrans)));
+ 
+ 	tuple.t_self = target->rd_rel->relntrans;
+ 
+ 	if (!heap_fetch(pg_class_nt, SnapshotNow, &tuple, &buffer, false, NULL))
+ 		elog(ERROR, "lookup failed for pg_class_nt entry for relation %s",
+ 			 RelationGetRelationName(target));
+ 
+ 	retval = heap_copytuple(&tuple);
+ 	ReleaseBuffer(buffer);
+ 
+ 	return retval;
+ }
+ 
+ /*
+  * update_class_nt_entry
+  * 		Update a given pg_class_nt entry.
+  *
+  * The passed tuple should have been obtained by get_class_nt_entry.
+  */
+ void
+ update_class_nt_entry(Relation pg_class_nt, HeapTuple tuple)
+ {
+ 	elog(DEBUG2, "updating pg_class_nt entry (%u/%u)",
+ 		 ItemPointerGetBlockNumber(&(tuple->t_self)),
+ 		 ItemPointerGetOffsetNumber(&(tuple->t_self)));
+ 	heap_inplace_update(pg_class_nt, tuple);
+ }
Index: src/backend/catalog/index.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.266
diff -c -r1.266 index.c
*** src/backend/catalog/index.c	10 May 2006 23:18:39 -0000	1.266
--- src/backend/catalog/index.c	12 Jun 2006 16:30:11 -0000
***************
*** 31,36 ****
--- 31,37 ----
  #include "catalog/heap.h"
  #include "catalog/index.h"
  #include "catalog/indexing.h"
+ #include "catalog/pg_class_nt.h"
  #include "catalog/pg_constraint.h"
  #include "catalog/pg_opclass.h"
  #include "catalog/pg_type.h"
***************
*** 569,575 ****
  
  	/*
  	 * Fill in fields of the index's pg_class entry that are not set correctly
! 	 * by heap_create.
  	 *
  	 * XXX should have a cleaner way to create cataloged indexes
  	 */
--- 570,576 ----
  
  	/*
  	 * Fill in fields of the index's pg_class entry that are not set correctly
! 	 * by heap_create, and create the pg_class_nt entry.
  	 *
  	 * XXX should have a cleaner way to create cataloged indexes
  	 */
***************
*** 577,582 ****
--- 578,585 ----
  	indexRelation->rd_rel->relam = accessMethodObjectId;
  	indexRelation->rd_rel->relkind = RELKIND_INDEX;
  	indexRelation->rd_rel->relhasoids = false;
+ 	indexRelation->rd_rel->relntrans =
+ 		AddNewNtClassTuple(indexRelation, RELKIND_INDEX);
  
  	/*
  	 * store index's pg_class entry
***************
*** 756,761 ****
--- 759,768 ----
  	}
  	else if (skip_build)
  	{
+ 		float4 ntuples;
+ 
+ 		relation_get_ntups_npages(heapRelation, &ntuples, NULL);
+ 
  		/*
  		 * Caller is responsible for filling the index later on.  However,
  		 * we'd better make sure that the heap relation is correctly marked
***************
*** 765,771 ****
  						   true,
  						   isprimary,
  						   InvalidOid,
! 						   heapRelation->rd_rel->reltuples);
  		/* Make the above update visible */
  		CommandCounterIncrement();
  	}
--- 772,778 ----
  						   true,
  						   isprimary,
  						   InvalidOid,
! 						   ntuples);
  		/* Make the above update visible */
  		CommandCounterIncrement();
  	}
***************
*** 826,831 ****
--- 833,841 ----
  	smgrscheduleunlink(userIndexRelation->rd_smgr,
  					   userIndexRelation->rd_istemp);
  
+ 	/* delete pg_class_nt tuple */
+ 	DeleteNtClassTuple(&(userIndexRelation->rd_rel->relntrans));
+ 
  	/*
  	 * Close and flush the index's relcache entry, to ensure relcache doesn't
  	 * try to rebuild it while we're deleting catalog entries. We keep the
***************
*** 1011,1027 ****
  /*
   * index_update_stats --- update pg_class entry after CREATE INDEX
   *
!  * This routine updates the pg_class row of either an index or its parent
!  * relation after CREATE INDEX.  Its rather bizarre API is designed to
!  * ensure we can do all the necessary work in just one update.
   *
   * hasindex: set relhasindex to this value
   * isprimary: if true, set relhaspkey true; else no change
   * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
   *		else no change
!  * reltuples: set reltuples to this value
   *
!  * relpages is also updated (using RelationGetNumberOfBlocks()).
   *
   * NOTE: an important side-effect of this operation is that an SI invalidation
   * message is sent out to all backends --- including me --- causing relcache
--- 1021,1037 ----
  /*
   * index_update_stats --- update pg_class entry after CREATE INDEX
   *
!  * This routine updates the pg_class and pg_class_nt rows of either an index or
!  * its parent relation after CREATE INDEX.  Its rather bizarre API is designed
!  * to ensure we can do all the necessary work in just one update.
   *
   * hasindex: set relhasindex to this value
   * isprimary: if true, set relhaspkey true; else no change
   * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
   *		else no change
!  * pg_class_nt.ntuples: set reltuples to this value
   *
!  * pg_class_nt.npages is also updated (using RelationGetNumberOfBlocks()).
   *
   * NOTE: an important side-effect of this operation is that an SI invalidation
   * message is sent out to all backends --- including me --- causing relcache
***************
*** 1038,1045 ****
--- 1048,1057 ----
  	BlockNumber	relpages = RelationGetNumberOfBlocks(rel);
  	Oid			relid = RelationGetRelid(rel);
  	Relation	pg_class;
+ 	Relation	pg_class_nt;
  	HeapTuple	tuple;
  	Form_pg_class rd_rel;
+ 	Form_pg_class_nt ntForm;
  	bool		in_place_upd;
  	bool		dirty;
  
***************
*** 1112,1127 ****
  			dirty = true;
  		}
  	}
- 	if (rd_rel->reltuples != (float4) reltuples)
- 	{
- 		rd_rel->reltuples = (float4) reltuples;
- 		dirty = true;
- 	}
- 	if (rd_rel->relpages != (int32) relpages)
- 	{
- 		rd_rel->relpages = (int32) relpages;
- 		dirty = true;
- 	}
  
  	/*
  	 * If anything changed, write out the tuple
--- 1124,1129 ----
***************
*** 1185,1190 ****
--- 1187,1216 ----
  	heap_freetuple(tuple);
  
  	heap_close(pg_class, RowExclusiveLock);
+ 
+ 	/* update the non-transactional (pg_class_nt) entry */
+ 	pg_class_nt = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	tuple = get_class_nt_entry(pg_class_nt, rel);
+ 
+ 	ntForm = (Form_pg_class_nt) GETSTRUCT(tuple);
+ 
+ 	dirty = false;
+ 	if (ntForm->ntuples != (float4) reltuples)
+ 	{
+ 		ntForm->ntuples = (float4) reltuples;
+ 		dirty = true;
+ 	}
+ 	if (ntForm->npages != (int32) relpages)
+ 	{
+ 		ntForm->npages = (int32) relpages;
+ 		dirty = true;
+ 	}
+ 	if (dirty)
+ 		update_class_nt_entry(pg_class_nt, tuple);
+ 	heap_freetuple(tuple);
+ 
+ 	heap_close(pg_class_nt, RowExclusiveLock);
  }
  
  /*
***************
*** 1240,1249 ****
  	RelationOpenSmgr(relation);
  	smgrscheduleunlink(relation->rd_smgr, relation->rd_istemp);
  
! 	/* update the pg_class row */
  	rd_rel->relfilenode = newrelfilenode;
- 	rd_rel->relpages = 0;		/* it's empty until further notice */
- 	rd_rel->reltuples = 0;
  	simple_heap_update(pg_class, &tuple->t_self, tuple);
  	CatalogUpdateIndexes(pg_class, tuple);
  
--- 1266,1278 ----
  	RelationOpenSmgr(relation);
  	smgrscheduleunlink(relation->rd_smgr, relation->rd_istemp);
  
! 	/*
! 	 * Update the pg_class row.  The caller is responsible for ensuring the
! 	 * pg_class_nt entry is correct.  While we could update it here, it's just
! 	 * a waste of time to set the values to 0, because currently all callers
! 	 * set them to the correct values sometime after calling us.
! 	 */
  	rd_rel->relfilenode = newrelfilenode;
  	simple_heap_update(pg_class, &tuple->t_self, tuple);
  	CatalogUpdateIndexes(pg_class, tuple);
  
Index: src/backend/commands/analyze.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.93
diff -c -r1.93 analyze.c
*** src/backend/commands/analyze.c	23 Mar 2006 00:19:28 -0000	1.93
--- src/backend/commands/analyze.c	12 Jun 2006 16:44:32 -0000
***************
*** 159,164 ****
--- 159,174 ----
  	}
  
  	/*
+ 	 * We don't process RELKIND_NON_TRANSACTIONAL relations, but we don't
+ 	 * complain since VACUUM processes them but we don't.
+ 	 */
+ 	if (onerel->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL)
+ 	{
+ 		relation_close(onerel, AccessShareLock);
+ 		return;
+ 	}
+ 
+ 	/*
  	 * Check that it's a plain table; we used to do this in get_rel_oids() but
  	 * seems safer to check after we've locked the relation.
  	 */
***************
*** 422,428 ****
  	 */
  	if (!vacstmt->vacuum)
  	{
! 		vac_update_relstats(RelationGetRelid(onerel),
  							RelationGetNumberOfBlocks(onerel),
  							totalrows,
  							hasindex);
--- 432,438 ----
  	 */
  	if (!vacstmt->vacuum)
  	{
! 		vac_update_relstats(onerel,
  							RelationGetNumberOfBlocks(onerel),
  							totalrows,
  							hasindex);
***************
*** 432,438 ****
  			double		totalindexrows;
  
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
! 			vac_update_relstats(RelationGetRelid(Irel[ind]),
  								RelationGetNumberOfBlocks(Irel[ind]),
  								totalindexrows,
  								false);
--- 442,448 ----
  			double		totalindexrows;
  
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
! 			vac_update_relstats(Irel[ind],
  								RelationGetNumberOfBlocks(Irel[ind]),
  								totalindexrows,
  								false);
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.147
diff -c -r1.147 cluster.c
*** src/backend/commands/cluster.c	2 May 2006 22:25:10 -0000	1.147
--- src/backend/commands/cluster.c	12 Jun 2006 03:48:28 -0000
***************
*** 728,733 ****
--- 728,734 ----
  				relform2;
  	Oid			swaptemp;
  	CatalogIndexState indstate;
+ 	ItemPointerData swap_tid;
  
  	/* We need writable copies of both pg_class tuples. */
  	relRelation = heap_open(RelationRelationId, RowExclusiveLock);
***************
*** 763,781 ****
  
  	/* we should not swap reltoastidxid */
  
! 	/* swap size statistics too, since new rel has freshly-updated stats */
! 	{
! 		int4		swap_pages;
! 		float4		swap_tuples;
! 
! 		swap_pages = relform1->relpages;
! 		relform1->relpages = relform2->relpages;
! 		relform2->relpages = swap_pages;
! 
! 		swap_tuples = relform1->reltuples;
! 		relform1->reltuples = relform2->reltuples;
! 		relform2->reltuples = swap_tuples;
! 	}
  
  	/* Update the tuples in pg_class */
  	simple_heap_update(relRelation, &reltup1->t_self, reltup1);
--- 764,779 ----
  
  	/* we should not swap reltoastidxid */
  
! 	/*
! 	 * Swap pg_class_nt rows too.  This automatically updates the
! 	 * size statistics.
! 	 *
! 	 * XXX make sure this is still correct when adding more rows to
! 	 * pg_class_nt.
! 	 */
! 	swap_tid = relform1->relntrans;
! 	relform1->relntrans = relform2->relntrans;
! 	relform2->relntrans = swap_tid;
  
  	/* Update the tuples in pg_class */
  	simple_heap_update(relRelation, &reltup1->t_self, reltup1);
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.330
diff -c -r1.330 vacuum.c
*** src/backend/commands/vacuum.c	10 May 2006 23:18:39 -0000	1.330
--- src/backend/commands/vacuum.c	12 Jun 2006 04:03:01 -0000
***************
*** 28,35 ****
--- 28,37 ----
  #include "access/subtrans.h"
  #include "access/xlog.h"
  #include "catalog/catalog.h"
+ #include "catalog/heap.h"
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
+ #include "catalog/pg_class_nt.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_index.h"
  #include "commands/dbcommands.h"
***************
*** 200,212 ****
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
! 			 const char *stmttype);
  static void vac_update_dbstats(Oid dbid,
  				   TransactionId vacuumXID,
  				   TransactionId frozenXID);
  static void vac_truncate_clog(TransactionId vacuumXID,
  				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
  		  VacPageList vacuum_pages, VacPageList fraged_pages);
--- 202,214 ----
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
! 			 const char *stmttype, bool full);
  static void vac_update_dbstats(Oid dbid,
  				   TransactionId vacuumXID,
  				   TransactionId frozenXID);
  static void vac_truncate_clog(TransactionId vacuumXID,
  				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
  		  VacPageList vacuum_pages, VacPageList fraged_pages);
***************
*** 350,356 ****
  	 * Build list of relations to process, unless caller gave us one. (If we
  	 * build one, we put it in vac_context for safekeeping.)
  	 */
! 	relations = get_rel_oids(relids, vacstmt->relation, stmttype);
  
  	if (vacstmt->vacuum && all_rels)
  	{
--- 352,359 ----
  	 * Build list of relations to process, unless caller gave us one. (If we
  	 * build one, we put it in vac_context for safekeeping.)
  	 */
! 	relations = get_rel_oids(relids, vacstmt->relation, stmttype,
! 		   					 vacstmt->full);
  
  	if (vacstmt->vacuum && all_rels)
  	{
***************
*** 446,452 ****
  
  			if (vacstmt->vacuum)
  			{
! 				if (!vacuum_rel(relid, vacstmt, RELKIND_RELATION))
  					all_rels = false;	/* forget about updating dbstats */
  			}
  			if (vacstmt->analyze)
--- 449,455 ----
  
  			if (vacstmt->vacuum)
  			{
! 				if (!vacuum_rel(relid, vacstmt, false))
  					all_rels = false;	/* forget about updating dbstats */
  			}
  			if (vacstmt->analyze)
***************
*** 563,569 ****
   * per-relation transactions.
   */
  static List *
! get_rel_oids(List *relids, const RangeVar *vacrel, const char *stmttype)
  {
  	List	   *oid_list = NIL;
  	MemoryContext oldcontext;
--- 566,573 ----
   * per-relation transactions.
   */
  static List *
! get_rel_oids(List *relids, const RangeVar *vacrel, const char *stmttype,
! 			 bool full)
  {
  	List	   *oid_list = NIL;
  	MemoryContext oldcontext;
***************
*** 590,608 ****
  		Relation	pgclass;
  		HeapScanDesc scan;
  		HeapTuple	tuple;
- 		ScanKeyData key;
- 
- 		ScanKeyInit(&key,
- 					Anum_pg_class_relkind,
- 					BTEqualStrategyNumber, F_CHAREQ,
- 					CharGetDatum(RELKIND_RELATION));
  
  		pgclass = heap_open(RelationRelationId, AccessShareLock);
  
! 		scan = heap_beginscan(pgclass, SnapshotNow, 1, &key);
  
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
  			/* Make a relation list entry for this guy */
  			oldcontext = MemoryContextSwitchTo(vac_context);
  			oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple));
--- 594,620 ----
  		Relation	pgclass;
  		HeapScanDesc scan;
  		HeapTuple	tuple;
  
  		pgclass = heap_open(RelationRelationId, AccessShareLock);
  
! 		scan = heap_beginscan(pgclass, SnapshotNow, 0, NULL);
  
  		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
  		{
+ 			Form_pg_class classForm;
+ 			char		relkind;
+ 
+ 			classForm = (Form_pg_class) GETSTRUCT(tuple);
+ 			relkind = classForm->relkind;
+ 
+ 			/* we only process plain and non-transactional tables */
+ 			if (relkind != RELKIND_RELATION &&
+ 				relkind != RELKIND_NON_TRANSACTIONAL)
+ 				continue;
+ 			/* we only process non-transactional catalogs on lazy vacuum */
+ 			if (relkind == RELKIND_NON_TRANSACTIONAL && full)
+ 				continue;
+ 
  			/* Make a relation list entry for this guy */
  			oldcontext = MemoryContextSwitchTo(vac_context);
  			oid_list = lappend_oid(oid_list, HeapTupleGetOid(tuple));
***************
*** 669,677 ****
   *	vac_update_relstats() -- update statistics for one relation
   *
   *		Update the whole-relation statistics that are kept in its pg_class
!  *		row.  There are additional stats that will be updated if we are
!  *		doing ANALYZE, but we always update these stats.  This routine works
!  *		for both index and heap relation entries in pg_class.
   *
   *		We violate transaction semantics here by overwriting the rel's
   *		existing pg_class tuple with the new values.  This is reasonably
--- 681,689 ----
   *	vac_update_relstats() -- update statistics for one relation
   *
   *		Update the whole-relation statistics that are kept in its pg_class
!  *		and pg_class_nt rows.  There are additional stats that will be updated
!  *		if we are doing ANALYZE, but we always update these stats.  This
!  *		routine works for both index and heap relation entries in pg_class.
   *
   *		We violate transaction semantics here by overwriting the rel's
   *		existing pg_class tuple with the new values.  This is reasonably
***************
*** 679,723 ****
   *		commits.  The reason for this is that if we updated these tuples in
   *		the usual way, vacuuming pg_class itself wouldn't work very well ---
   *		by the time we got done with a vacuum cycle, most of the tuples in
!  *		pg_class would've been obsoleted.  Of course, this only works for
!  *		fixed-size never-null columns, but these are.
   *
   *		This routine is shared by full VACUUM, lazy VACUUM, and stand-alone
   *		ANALYZE.
   */
  void
! vac_update_relstats(Oid relid, BlockNumber num_pages, double num_tuples,
  					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
  	Form_pg_class pgcform;
  	bool		dirty;
  
  	rd = heap_open(RelationRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
  	ctup = SearchSysCacheCopy(RELOID,
! 							  ObjectIdGetDatum(relid),
  							  0, 0, 0);
  	if (!HeapTupleIsValid(ctup))
  		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 			 relid);
  	pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
  	/* Apply required updates, if any, to copied tuple */
  
  	dirty = false;
- 	if (pgcform->relpages != (int32) num_pages)
- 	{
- 		pgcform->relpages = (int32) num_pages;
- 		dirty = true;
- 	}
- 	if (pgcform->reltuples != (float4) num_tuples)
- 	{
- 		pgcform->reltuples = (float4) num_tuples;
- 		dirty = true;
- 	}
  	if (pgcform->relhasindex != hasindex)
  	{
  		pgcform->relhasindex = hasindex;
--- 691,758 ----
   *		commits.  The reason for this is that if we updated these tuples in
   *		the usual way, vacuuming pg_class itself wouldn't work very well ---
   *		by the time we got done with a vacuum cycle, most of the tuples in
!  *		them would've been obsoleted.  Of course, this only works for
!  *		fixed-size never-null columns, but these are.  Also, note this doesn't
!  *		apply to pg_class_nt, because by definition of non-transactional
!  *		catalog we must always update it in-place.
   *
   *		This routine is shared by full VACUUM, lazy VACUUM, and stand-alone
   *		ANALYZE.
   */
  void
! vac_update_relstats(Relation rel, BlockNumber num_pages, double num_tuples,
  					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
+ 	Form_pg_class_nt pgntform;
  	Form_pg_class pgcform;
  	bool		dirty;
  
+ 	/* Process pg_class_nt first */
+ 	rd = heap_open(NtClassRelationId, RowExclusiveLock);
+ 
+ 	/* Fetch a copy of the pg_class_nt tuple to scribble on */
+ 	ctup = get_class_nt_entry(rd, rel);
+ 
+ 	pgntform = (Form_pg_class_nt) GETSTRUCT(ctup);
+ 
+ 	/* Apply required updates, if any, to copied tuple */
+ 
+ 	dirty = false;
+ 	if (pgntform->npages != (int32) num_pages)
+ 	{
+ 		pgntform->npages = (int32) num_pages;
+ 		dirty = true;
+ 	}
+ 	if (pgntform->ntuples != (float4) num_tuples)
+ 	{
+ 		pgntform->ntuples = (float4) num_tuples;
+ 		dirty = true;
+ 	}
+ 
+ 	/* If anything changed, write out the tuple */
+ 	if (dirty)
+ 		update_class_nt_entry(rd, ctup);
+ 	heap_freetuple(ctup);
+ 
+ 	heap_close(rd, RowExclusiveLock);
+ 
+ 	/* Now go for the pg_class tuple */
  	rd = heap_open(RelationRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
  	ctup = SearchSysCacheCopy(RELOID,
! 							  ObjectIdGetDatum(RelationGetRelid(rel)),
  							  0, 0, 0);
  	if (!HeapTupleIsValid(ctup))
  		elog(ERROR, "pg_class entry for relid %u vanished during vacuuming",
! 			 RelationGetRelid(rel));
  	pgcform = (Form_pg_class) GETSTRUCT(ctup);
  
  	/* Apply required updates, if any, to copied tuple */
  
  	dirty = false;
  	if (pgcform->relhasindex != hasindex)
  	{
  		pgcform->relhasindex = hasindex;
***************
*** 934,940 ****
   *		At entry and exit, we are not inside a transaction.
   */
  static bool
! vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
--- 969,975 ----
   *		At entry and exit, we are not inside a transaction.
   */
  static bool
! vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool allow_toast)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
***************
*** 1005,1013 ****
  
  	/*
  	 * Check that it's a plain table; we used to do this in get_rel_oids() but
! 	 * seems safer to check after we've locked the relation.
  	 */
! 	if (onerel->rd_rel->relkind != expected_relkind)
  	{
  		ereport(WARNING,
  				(errmsg("skipping \"%s\" --- cannot vacuum indexes, views, or special system tables",
--- 1040,1051 ----
  
  	/*
  	 * Check that it's a plain table; we used to do this in get_rel_oids() but
! 	 * seems safer to check after we've locked the relation.  We only accept
! 	 * TOAST tables if the caller told us it's OK to do so.
  	 */
! 	if (!(onerel->rd_rel->relkind == RELKIND_RELATION ||
! 		  onerel->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL ||
! 		  (onerel->rd_rel->relkind == RELKIND_TOASTVALUE && allow_toast)))
  	{
  		ereport(WARNING,
  				(errmsg("skipping \"%s\" --- cannot vacuum indexes, views, or special system tables",
***************
*** 1019,1024 ****
--- 1057,1078 ----
  	}
  
  	/*
+ 	 * We only process non-transactional catalogs in lazy vacuum, because they
+ 	 * are referenced by TID and applying VACUUM FULL to them would destroy the
+ 	 * references.
+ 	 */
+ 	if (onerel->rd_rel->relkind == RELKIND_NON_TRANSACTIONAL && vacstmt->full)
+ 	{
+ 		ereport(WARNING,
+ 				(errmsg("skipping \"%s\" --- cannot vacuum non-transactional catalogs in VACUUM FULL",
+ 						RelationGetRelationName(onerel))));
+ 		relation_close(onerel, lmode);
+ 		StrategyHintVacuum(false);
+ 		CommitTransactionCommand();
+ 		return false;
+ 	}
+ 
+ 	/*
  	 * Silently ignore tables that are temp tables of other backends ---
  	 * trying to vacuum these will lead to great unhappiness, since their
  	 * contents are probably not up-to-date on disk.  (We don't throw a
***************
*** 1079,1085 ****
  	 */
  	if (toast_relid != InvalidOid)
  	{
! 		if (!vacuum_rel(toast_relid, vacstmt, RELKIND_TOASTVALUE))
  			result = false;		/* failed to vacuum the TOAST table? */
  	}
  
--- 1133,1139 ----
  	 */
  	if (toast_relid != InvalidOid)
  	{
! 		if (!vacuum_rel(toast_relid, vacstmt, true))
  			result = false;		/* failed to vacuum the TOAST table? */
  	}
  
***************
*** 1179,1185 ****
  	vac_update_fsm(onerel, &fraged_pages, vacrelstats->rel_pages);
  
  	/* update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel), vacrelstats->rel_pages,
  						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
--- 1233,1239 ----
  	vac_update_fsm(onerel, &fraged_pages, vacrelstats->rel_pages);
  
  	/* update statistics in pg_class */
! 	vac_update_relstats(onerel, vacrelstats->rel_pages,
  						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
***************
*** 2938,2944 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
--- 2992,2998 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
***************
*** 3007,3013 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
--- 3061,3067 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages, stats->num_index_tuples,
  						false);
  
Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.70
diff -c -r1.70 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c	2 May 2006 22:25:10 -0000	1.70
--- src/backend/commands/vacuumlazy.c	10 Jun 2006 23:17:47 -0000
***************
*** 175,181 ****
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel),
  						vacrelstats->rel_pages,
  						vacrelstats->rel_tuples,
  						hasindex);
--- 175,181 ----
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(onerel,
  						vacrelstats->rel_pages,
  						vacrelstats->rel_tuples,
  						hasindex);
***************
*** 667,673 ****
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages,
  						stats->num_index_tuples,
  						false);
--- 667,673 ----
  		return;
  
  	/* now update statistics in pg_class */
! 	vac_update_relstats(indrel,
  						stats->num_pages,
  						stats->num_index_tuples,
  						false);
Index: src/backend/optimizer/util/plancat.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/optimizer/util/plancat.c,v
retrieving revision 1.119
diff -c -r1.119 plancat.c
*** src/backend/optimizer/util/plancat.c	5 Mar 2006 15:58:31 -0000	1.119
--- src/backend/optimizer/util/plancat.c	10 Jun 2006 23:17:47 -0000
***************
*** 19,24 ****
--- 19,25 ----
  
  #include "access/genam.h"
  #include "access/heapam.h"
+ #include "catalog/heap.h"
  #include "catalog/pg_amop.h"
  #include "catalog/pg_inherits.h"
  #include "catalog/pg_index.h"
***************
*** 235,240 ****
--- 236,243 ----
  	BlockNumber relpages;
  	double		reltuples;
  	double		density;
+ 	int4		npages;
+ 	float4		ntuples;
  
  	switch (rel->rd_rel->relkind)
  	{
***************
*** 263,269 ****
  			 * world, and it shouldn't degrade the quality of the plan too
  			 * much anyway to err in this direction.
  			 */
! 			if (curpages < 10 && rel->rd_rel->relpages == 0)
  				curpages = 10;
  
  			/* report estimated # pages */
--- 266,273 ----
  			 * world, and it shouldn't degrade the quality of the plan too
  			 * much anyway to err in this direction.
  			 */
! 			relation_get_ntups_npages(rel, &ntuples, &npages);
! 			if (curpages < 10 && ntuples == 0)
  				curpages = 10;
  
  			/* report estimated # pages */
***************
*** 275,282 ****
  				break;
  			}
  			/* coerce values in pg_class to more desirable types */
! 			relpages = (BlockNumber) rel->rd_rel->relpages;
! 			reltuples = (double) rel->rd_rel->reltuples;
  
  			/*
  			 * If it's an index, discount the metapage.  This is a kluge
--- 279,286 ----
  				break;
  			}
  			/* coerce values in pg_class to more desirable types */
! 			relpages = (BlockNumber) npages;
! 			reltuples = (double) ntuples;
  
  			/*
  			 * If it's an index, discount the metapage.  This is a kluge
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.19
diff -c -r1.19 autovacuum.c
*** src/backend/postmaster/autovacuum.c	19 May 2006 15:15:37 -0000	1.19
--- src/backend/postmaster/autovacuum.c	12 Jun 2006 03:51:26 -0000
***************
*** 24,29 ****
--- 24,30 ----
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
+ #include "catalog/heap.h"
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
  #include "catalog/pg_autovacuum.h"
***************
*** 615,623 ****
  		ScanKeyData entry[1];
  		Oid			relid;
  
! 		/* Consider only regular and toast tables. */
  		if (classForm->relkind != RELKIND_RELATION &&
! 			classForm->relkind != RELKIND_TOASTVALUE)
  			continue;
  
  		/*
--- 616,625 ----
  		ScanKeyData entry[1];
  		Oid			relid;
  
! 		/* Consider only regular, non-transactional and toast tables. */
  		if (classForm->relkind != RELKIND_RELATION &&
! 			classForm->relkind != RELKIND_TOASTVALUE &&
! 			classForm->relkind != RELKIND_NON_TRANSACTIONAL)
  			continue;
  
  		/*
***************
*** 734,740 ****
  					 List **toast_table_ids)
  {
  	Relation	rel;
! 	float4		reltuples;		/* pg_class.reltuples */
  
  	/* constants from pg_autovacuum or GUC variables */
  	int			vac_base_thresh,
--- 736,742 ----
  					 List **toast_table_ids)
  {
  	Relation	rel;
! 	float4		reltuples;		/* pg_class_nt.ntuples */
  
  	/* constants from pg_autovacuum or GUC variables */
  	int			vac_base_thresh,
***************
*** 773,779 ****
  	if (!PointerIsValid(rel))
  		return;
  
! 	reltuples = rel->rd_rel->reltuples;
  	vactuples = tabentry->n_dead_tuples;
  	anltuples = tabentry->n_live_tuples + tabentry->n_dead_tuples -
  		tabentry->last_anl_tuples;
--- 775,781 ----
  	if (!PointerIsValid(rel))
  		return;
  
! 	relation_get_ntups_npages(rel, &reltuples, NULL);
  	vactuples = tabentry->n_dead_tuples;
  	anltuples = tabentry->n_live_tuples + tabentry->n_dead_tuples -
  		tabentry->last_anl_tuples;
***************
*** 843,849 ****
  
  	Assert(CurrentMemoryContext == AutovacMemCxt);
  
! 	if (classForm->relkind == RELKIND_RELATION)
  	{
  		if (dovacuum || doanalyze)
  			elog(DEBUG2, "autovac: will%s%s %s",
--- 845,852 ----
  
  	Assert(CurrentMemoryContext == AutovacMemCxt);
  
! 	if (classForm->relkind == RELKIND_RELATION ||
! 		classForm->relkind == RELKIND_NON_TRANSACTIONAL)
  	{
  		if (dovacuum || doanalyze)
  			elog(DEBUG2, "autovac: will%s%s %s",
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.241
diff -c -r1.241 relcache.c
*** src/backend/utils/cache/relcache.c	6 May 2006 15:51:07 -0000	1.241
--- src/backend/utils/cache/relcache.c	12 Jun 2006 17:01:20 -0000
***************
*** 40,45 ****
--- 40,46 ----
  #include "catalog/pg_attrdef.h"
  #include "catalog/pg_attribute.h"
  #include "catalog/pg_authid.h"
+ #include "catalog/pg_class_nt.h"
  #include "catalog/pg_constraint.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_namespace.h"
***************
*** 71,77 ****
   */
  #define RELCACHE_INIT_FILENAME	"pg_internal.init"
  
! #define RELCACHE_INIT_FILEMAGIC		0x573262	/* version ID value */
  
  /*
   *		hardcoded tuple descriptors.  see include/catalog/pg_attribute.h
--- 72,78 ----
   */
  #define RELCACHE_INIT_FILENAME	"pg_internal.init"
  
! #define RELCACHE_INIT_FILEMAGIC		0x573263	/* version ID value */
  
  /*
   *		hardcoded tuple descriptors.  see include/catalog/pg_attribute.h
***************
*** 81,86 ****
--- 82,88 ----
  static FormData_pg_attribute Desc_pg_proc[Natts_pg_proc] = {Schema_pg_proc};
  static FormData_pg_attribute Desc_pg_type[Natts_pg_type] = {Schema_pg_type};
  static FormData_pg_attribute Desc_pg_index[Natts_pg_index] = {Schema_pg_index};
+ static FormData_pg_attribute Desc_pg_class_nt[Natts_pg_class_nt] = {Schema_pg_class_nt};
  
  /*
   *		Hash tables that index the relation cache
***************
*** 1163,1169 ****
   *		catalogs.
   *
   * formrdesc is currently used for: pg_class, pg_attribute, pg_proc,
!  * and pg_type (see RelationCacheInitializePhase2).
   *
   * Note that these catalogs can't have constraints (except attnotnull),
   * default values, rules, or triggers, since we don't cope with any of that.
--- 1165,1171 ----
   *		catalogs.
   *
   * formrdesc is currently used for: pg_class, pg_attribute, pg_proc,
!  * pg_class_nt and pg_type (see RelationCacheInitializePhase2).
   *
   * Note that these catalogs can't have constraints (except attnotnull),
   * default values, rules, or triggers, since we don't cope with any of that.
***************
*** 1220,1232 ****
  	 */
  	relation->rd_rel->relisshared = false;
  
- 	relation->rd_rel->relpages = 1;
- 	relation->rd_rel->reltuples = 1;
  	relation->rd_rel->relkind = RELKIND_RELATION;
  	relation->rd_rel->relhasoids = hasoids;
  	relation->rd_rel->relnatts = (int16) natts;
  
  	/*
  	 * initialize attribute tuple form
  	 *
  	 * Unlike the case with the relation tuple, this data had better be right
--- 1222,1238 ----
  	 */
  	relation->rd_rel->relisshared = false;
  
  	relation->rd_rel->relkind = RELKIND_RELATION;
  	relation->rd_rel->relhasoids = hasoids;
  	relation->rd_rel->relnatts = (int16) natts;
  
  	/*
+ 	 * We don't need relntrans at this point, so set it to invalid to make
+ 	 * sure we will correct it later before trying to access it.
+ 	 */
+ 	ItemPointerSetInvalid(&(relation->rd_rel->relntrans));
+ 
+ 	/*
  	 * initialize attribute tuple form
  	 *
  	 * Unlike the case with the relation tuple, this data had better be right
***************
*** 2028,2033 ****
--- 2034,2040 ----
  		case AttributeRelationId:
  		case ProcedureRelationId:
  		case TypeRelationId:
+ 		case NtClassRelationId:
  			nailit = true;
  			break;
  		default:
***************
*** 2120,2125 ****
--- 2127,2146 ----
  	rel->rd_rel->relfilenode = relid;
  	rel->rd_rel->reltablespace = reltablespace;
  
+ 	/*
+ 	 * Set relntrans to invalid for now.  It'll be corrected later by the
+ 	 * caller.  However, in bootstrap mode we must create valid entries for
+ 	 * bootstrap relations, because the bootstrap process creates those only
+ 	 * with heap_create and cannot fix them later, because it doesn't have
+ 	 * access to their true pg_class_nt TID.  So we make them point to the
+ 	 * dummy entry (0,1) to make sure the bootstrapper doesn't overwrite other
+ 	 * relation's entry.
+ 	 */
+ 	if (IsBootstrapProcessingMode() && nailit)
+ 		ItemPointerSet(&(rel->rd_rel->relntrans), 0, 1);
+ 	else
+ 		ItemPointerSetInvalid(&(rel->rd_rel->relntrans));
+ 
  	RelationInitLockInfo(rel);	/* see lmgr.c */
  
  	RelationInitPhysicalAddr(rel);
***************
*** 2230,2237 ****
  				  true, Natts_pg_proc, Desc_pg_proc);
  		formrdesc("pg_type", PG_TYPE_RELTYPE_OID,
  				  true, Natts_pg_type, Desc_pg_type);
  
! #define NUM_CRITICAL_RELS	4	/* fix if you change list above */
  	}
  
  	MemoryContextSwitchTo(oldcxt);
--- 2251,2260 ----
  				  true, Natts_pg_proc, Desc_pg_proc);
  		formrdesc("pg_type", PG_TYPE_RELTYPE_OID,
  				  true, Natts_pg_type, Desc_pg_type);
+ 		formrdesc("pg_class_nt", PG_CLASS_NT_RELTYPE_OID,
+ 				  true, Natts_pg_class_nt, Desc_pg_class_nt);
  
! #define NUM_CRITICAL_RELS	5	/* fix if you change list above */
  	}
  
  	MemoryContextSwitchTo(oldcxt);
Index: src/include/catalog/catversion.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/catversion.h,v
retrieving revision 1.334
diff -c -r1.334 catversion.h
*** src/include/catalog/catversion.h	24 May 2006 11:01:39 -0000	1.334
--- src/include/catalog/catversion.h	12 Jun 2006 03:33:27 -0000
***************
*** 53,58 ****
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	200605241
  
  #endif
--- 53,58 ----
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	200606101
  
  #endif
Index: src/include/catalog/heap.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/heap.h,v
retrieving revision 1.80
diff -c -r1.80 heap.h
*** src/include/catalog/heap.h	30 Apr 2006 01:08:07 -0000	1.80
--- src/include/catalog/heap.h	12 Jun 2006 03:39:27 -0000
***************
*** 43,48 ****
--- 43,50 ----
  			bool shared_relation,
  			bool allow_system_table_mods);
  
+ extern ItemPointerData AddNewNtClassTuple(Relation new_rel_desc, char relkind);
+ 
  extern Oid heap_create_with_catalog(const char *relname,
  						 Oid relnamespace,
  						 Oid reltablespace,
***************
*** 86,91 ****
--- 88,94 ----
  				  DropBehavior behavior, bool complain);
  extern void RemoveAttrDefaultById(Oid attrdefId);
  extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+ extern void DeleteNtClassTuple(ItemPointer tid);
  
  extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
  						  bool relhasoids);
***************
*** 97,100 ****
--- 100,114 ----
  
  extern void CheckAttributeType(const char *attname, Oid atttypid);
  
+ /*
+  * use "struct HeapTupleData *" here instead of HeapTuple to avoid including
+  * htup.h
+  */
+ extern void relation_get_ntups_npages(Relation relation, float4 *ntuples,
+ 						  int4 *npages);
+ extern struct HeapTupleData *get_class_nt_entry(Relation pg_class_nt,
+ 				   Relation target);
+ extern void update_class_nt_entry(Relation pg_class_nt,
+ 					  struct HeapTupleData *tuple);
+ 
  #endif   /* HEAP_H */
Index: src/include/catalog/pg_attribute.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_attribute.h,v
retrieving revision 1.120
diff -c -r1.120 pg_attribute.h
*** src/include/catalog/pg_attribute.h	5 Mar 2006 15:58:54 -0000	1.120
--- src/include/catalog/pg_attribute.h	12 Jun 2006 03:39:45 -0000
***************
*** 387,410 ****
  { 1259, {"relam"},		   26, -1,	4,	5, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"relfilenode"},   26, -1,	4,	6, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"reltablespace"}, 26, -1,	4,	7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relpages"},	   23, -1,	4,	8, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltuples"},	   700, -1, 4,	9, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastrelid"}, 26, -1,	4, 10, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastidxid"}, 26, -1,	4, 11, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relhasindex"},   16, -1,	1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relisshared"},   16, -1,	1, 13, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relkind"},	   18, -1,	1, 14, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relnatts"},	   21, -1,	2, 15, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relchecks"},	   21, -1,	2, 16, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"reltriggers"},   21, -1,	2, 17, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relukeys"},	   21, -1,	2, 18, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relfkeys"},	   21, -1,	2, 19, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relrefs"},	   21, -1,	2, 20, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasoids"},    16, -1,	1, 21, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhaspkey"},    16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhasrules"},   16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhassubclass"},16, -1,	1, 24, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 25, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
--- 387,409 ----
  { 1259, {"relam"},		   26, -1,	4,	5, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"relfilenode"},   26, -1,	4,	6, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
  { 1259, {"reltablespace"}, 26, -1,	4,	7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastrelid"}, 26, -1,	4, 8, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"reltoastidxid"}, 26, -1,	4, 9, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relntrans"},	   27, -1,	6, 10, 0, -1, -1, false, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasindex"},   16, -1,	1, 11, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relisshared"},   16, -1,	1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relkind"},	   18, -1,	1, 13, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relnatts"},	   21, -1,	2, 14, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relchecks"},	   21, -1,	2, 15, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"reltriggers"},   21, -1,	2, 16, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relukeys"},	   21, -1,	2, 17, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relfkeys"},	   21, -1,	2, 18, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relrefs"},	   21, -1,	2, 19, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \
! { 1259, {"relhasoids"},    16, -1,	1, 20, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhaspkey"},    16, -1,	1, 21, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhasrules"},   16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relhassubclass"},16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 24, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
***************
*** 413,436 ****
  DATA(insert ( 1259 relam			26 -1 4   5 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 relfilenode		26 -1 4   6 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 reltablespace	26 -1 4   7 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relpages			23 -1 4   8 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltuples	   700 -1 4   9 0 -1 -1 f p i t f f t 0));
! DATA(insert ( 1259 reltoastrelid	26 -1 4  10 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastidxid	26 -1 4  11 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relhasindex		16 -1 1  12 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relisshared		16 -1 1  13 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relkind			18 -1 1  14 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relnatts			21 -1 2  15 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relchecks		21 -1 2  16 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 reltriggers		21 -1 2  17 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relukeys			21 -1 2  18 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relfkeys			21 -1 2  19 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relrefs			21 -1 2  20 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relhasoids		16 -1 1  21 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhaspkey		16 -1 1  22 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhasrules		16 -1 1  23 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhassubclass	16 -1 1  24 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 25 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
--- 412,434 ----
  DATA(insert ( 1259 relam			26 -1 4   5 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 relfilenode		26 -1 4   6 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 reltablespace	26 -1 4   7 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastrelid	26 -1 4  8 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 reltoastidxid	26 -1 4  9 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relntrans		27 -1 6  10 0 -1 -1 f p s t f f t 0));
! DATA(insert ( 1259 relhasindex		16 -1 1  11 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relisshared		16 -1 1  12 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relkind			18 -1 1  13 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relnatts			21 -1 2  14 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relchecks		21 -1 2  15 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 reltriggers		21 -1 2  16 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relukeys			21 -1 2  17 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relfkeys			21 -1 2  18 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relrefs			21 -1 2  19 0 -1 -1 t p s t f f t 0));
! DATA(insert ( 1259 relhasoids		16 -1 1  20 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhaspkey		16 -1 1  21 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhasrules		16 -1 1  22 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relhassubclass	16 -1 1  23 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 24 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
***************
*** 440,445 ****
--- 438,461 ----
  DATA(insert ( 1259 tableoid			26 0  4  -7 0 -1 -1 t p i t f f t 0));
  
  /* ----------------
+  *		pg_class_nt
+  * ----------------
+  */
+ #define Schema_pg_class_nt \
+ { 1004, {"npages"},	   23, -1,	4,	1, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
+ { 1004, {"ntuples"},	   700, -1, 4,	2, 0, -1, -1, false, 'p', 'i', true, false, false, true, 0 }
+ 
+ DATA(insert ( 1004 npages		 23 -1 4  1 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 ntuples	   	700 -1 4  2 0 -1 -1 f p i t f f t 0));
+ DATA(insert ( 1004 ctid			 27  0 6 -1 0 -1 -1 f p s t f f t 0));
+ /* no OIDs in pg_class_nt */
+ DATA(insert ( 1004 xmin			 28  0 4 -3 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 cmin			 29  0 4 -4 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 xmax			 28  0 4 -5 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 cmax			 29  0 4 -6 0 -1 -1 t p i t f f t 0));
+ DATA(insert ( 1004 tableoid		 26  0 4 -7 0 -1 -1 t p i t f f t 0));
+ 
+ /* ----------------
   *		pg_index
   *
   * pg_index is not bootstrapped in the same way as the other relations that
Index: src/include/catalog/pg_class.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_class.h,v
retrieving revision 1.92
diff -c -r1.92 pg_class.h
*** src/include/catalog/pg_class.h	28 May 2006 02:27:08 -0000	1.92
--- src/include/catalog/pg_class.h	12 Jun 2006 16:11:40 -0000
***************
*** 19,24 ****
--- 19,26 ----
  #ifndef PG_CLASS_H
  #define PG_CLASS_H
  
+ #include "storage/itemptr.h"
+ 
  /* ----------------
   *		postgres.h contains the system type definitions and the
   *		CATALOG(), BKI_BOOTSTRAP and DATA() sugar words so this file
***************
*** 42,47 ****
--- 44,59 ----
   */
  #define RelationRelationId	1259
  
+ /*
+  * This CATALOG() entry is used by genbki to create the initial bootstrap file,
+  * but the bootstrap type system only recognizes "tid" as a valid type.
+  * However, CPP also turns this definition into a C struct, but the C name of
+  * that type is actually ItemPointerData.  So we put this definition here in
+  * order to have both systems understand that element.  To limit the possible
+  * damage done, we #undef it right at the bottom.
+  */
+ #define tid	ItemPointerData
+ 
  CATALOG(pg_class,1259) BKI_BOOTSTRAP
  {
  	NameData	relname;		/* class name */
***************
*** 51,60 ****
  	Oid			relam;			/* index access method; 0 if not an index */
  	Oid			relfilenode;	/* identifier of physical storage file */
  	Oid			reltablespace;	/* identifier of table space for relation */
- 	int4		relpages;		/* # of blocks (not always up-to-date) */
- 	float4		reltuples;		/* # of tuples (not always up-to-date) */
  	Oid			reltoastrelid;	/* OID of toast table; 0 if none */
  	Oid			reltoastidxid;	/* if toast table, OID of chunk_id index */
  	bool		relhasindex;	/* T if has (or has had) any indexes */
  	bool		relisshared;	/* T if shared across databases */
  	char		relkind;		/* see RELKIND_xxx constants below */
--- 63,71 ----
  	Oid			relam;			/* index access method; 0 if not an index */
  	Oid			relfilenode;	/* identifier of physical storage file */
  	Oid			reltablespace;	/* identifier of table space for relation */
  	Oid			reltoastrelid;	/* OID of toast table; 0 if none */
  	Oid			reltoastidxid;	/* if toast table, OID of chunk_id index */
+ 	tid			relntrans;		/* CTID of pg_class_nt entry */
  	bool		relhasindex;	/* T if has (or has had) any indexes */
  	bool		relisshared;	/* T if shared across databases */
  	char		relkind;		/* see RELKIND_xxx constants below */
***************
*** 80,85 ****
--- 91,97 ----
  	 */
  	aclitem		relacl[1];		/* we declare this just for the catalog */
  } FormData_pg_class;
+ #undef tid
  
  /* Size of fixed part of pg_class tuples, not counting relacl or padding */
  #define CLASS_TUPLE_SIZE \
***************
*** 103,110 ****
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			24
! #define Natts_pg_class					25
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
--- 115,122 ----
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			23
! #define Natts_pg_class					24
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
***************
*** 112,135 ****
  #define Anum_pg_class_relam				5
  #define Anum_pg_class_relfilenode		6
  #define Anum_pg_class_reltablespace		7
! #define Anum_pg_class_relpages			8
! #define Anum_pg_class_reltuples			9
! #define Anum_pg_class_reltoastrelid		10
! #define Anum_pg_class_reltoastidxid		11
! #define Anum_pg_class_relhasindex		12
! #define Anum_pg_class_relisshared		13
! #define Anum_pg_class_relkind			14
! #define Anum_pg_class_relnatts			15
! #define Anum_pg_class_relchecks			16
! #define Anum_pg_class_reltriggers		17
! #define Anum_pg_class_relukeys			18
! #define Anum_pg_class_relfkeys			19
! #define Anum_pg_class_relrefs			20
! #define Anum_pg_class_relhasoids		21
! #define Anum_pg_class_relhaspkey		22
! #define Anum_pg_class_relhasrules		23
! #define Anum_pg_class_relhassubclass	24
! #define Anum_pg_class_relacl			25
  
  /* ----------------
   *		initial contents of pg_class
--- 124,146 ----
  #define Anum_pg_class_relam				5
  #define Anum_pg_class_relfilenode		6
  #define Anum_pg_class_reltablespace		7
! #define Anum_pg_class_reltoastrelid		8
! #define Anum_pg_class_reltoastidxid		9
! #define Anum_pg_class_relhasindex		10
! #define Anum_pg_class_relisshared		11
! #define Anum_pg_class_relkind			12
! #define Anum_pg_class_relnatts			13
! #define Anum_pg_class_relntrans			14
! #define Anum_pg_class_relchecks			15
! #define Anum_pg_class_reltriggers		16
! #define Anum_pg_class_relukeys			17
! #define Anum_pg_class_relfkeys			18
! #define Anum_pg_class_relrefs			19
! #define Anum_pg_class_relhasoids		20
! #define Anum_pg_class_relhaspkey		21
! #define Anum_pg_class_relhasrules		22
! #define Anum_pg_class_relhassubclass	23
! #define Anum_pg_class_relacl			24
  
  /* ----------------
   *		initial contents of pg_class
***************
*** 139,151 ****
   * ----------------
   */
  
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 0 0 f f r 23 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 18 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 25 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
--- 150,164 ----
   * ----------------
   */
  
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 "(0,2)" f f r 23 0 0 0 0 0 t f f f _null_ ));
! DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 "(0,3)" f f r 17 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 "(0,4)" f f r 18 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 "(0,5)" f f r 24 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1004 (  pg_class_nt	PGNSP 86 PGUID 0 1004 0 0 0 "(0,6)" f f n  2 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
***************
*** 155,159 ****
--- 168,173 ----
  #define		  RELKIND_TOASTVALUE	  't'		/* moved off huge values */
  #define		  RELKIND_VIEW			  'v'		/* view */
  #define		  RELKIND_COMPOSITE_TYPE  'c'		/* composite type */
+ #define		  RELKIND_NON_TRANSACTIONAL 'n'		/* non-transactional heap */
  
  #endif   /* PG_CLASS_H */
Index: src/include/catalog/pg_type.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_type.h,v
retrieving revision 1.171
diff -c -r1.171 pg_type.h
*** src/include/catalog/pg_type.h	5 Apr 2006 22:11:57 -0000	1.171
--- src/include/catalog/pg_type.h	12 Jun 2006 03:38:34 -0000
***************
*** 314,319 ****
--- 314,321 ----
  #define PG_PROC_RELTYPE_OID 81
  DATA(insert OID = 83 (	pg_class		PGNSP PGUID -1 f c t \054 1259 0 record_in record_out record_recv record_send - d x f 0 -1 0 _null_ _null_ ));
  #define PG_CLASS_RELTYPE_OID 83
+ DATA(insert OID = 86 (	pg_class_nt		PGNSP PGUID -1 f c t \054 1004 0 record_in record_out record_recv record_send - d x f 0 -1 0 _null_ _null_ ));
+ #define PG_CLASS_NT_RELTYPE_OID 86
  
  /* OIDS 100 - 199 */
  
Index: src/include/commands/vacuum.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/commands/vacuum.h,v
retrieving revision 1.63
diff -c -r1.63 vacuum.h
*** src/include/commands/vacuum.h	5 Mar 2006 15:58:55 -0000	1.63
--- src/include/commands/vacuum.h	5 Jun 2006 02:36:52 -0000
***************
*** 114,120 ****
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Oid relid,
  					BlockNumber num_pages,
  					double num_tuples,
  					bool hasindex);
--- 114,120 ----
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Relation rel,
  					BlockNumber num_pages,
  					double num_tuples,
  					bool hasindex);
Index: src/test/regress/expected/stats.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/stats.out,v
retrieving revision 1.7
diff -c -r1.7 stats.out
*** src/test/regress/expected/stats.out	11 Jan 2006 20:12:42 -0000	1.7
--- src/test/regress/expected/stats.out	12 Jun 2006 03:53:48 -0000
***************
*** 2,8 ****
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by create_misc) and indexed (by create_index).
  --
  -- conditio sine qua non
  SHOW stats_start_collector;  -- must be on
--- 2,8 ----
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by copy and create_misc) and indexed (by create_index).
  --
  -- conditio sine qua non
  SHOW stats_start_collector;  -- must be on
***************
*** 44,62 ****
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
   t        | t        | t        | t
  (1 row)
  
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? 
  ----------+----------
--- 44,66 ----
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + nt.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st,
!        pg_class AS cl JOIN pg_class_nt AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
   t        | t        | t        | t
  (1 row)
  
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + nt.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st,
!        pg_class AS cl JOIN pg_class_nt AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? 
  ----------+----------
Index: src/test/regress/expected/type_sanity.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/type_sanity.out,v
retrieving revision 1.27
diff -c -r1.27 type_sanity.out
*** src/test/regress/expected/type_sanity.out	10 Jul 2005 21:14:00 -0000	1.27
--- src/test/regress/expected/type_sanity.out	5 Jun 2006 02:36:52 -0000
***************
*** 216,222 ****
  -- Look for illegal values in pg_class fields
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v');
   oid | relname 
  -----+---------
  (0 rows)
--- 216,222 ----
  -- Look for illegal values in pg_class fields
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v', 'n');
   oid | relname 
  -----+---------
  (0 rows)
Index: src/test/regress/expected/without_oid.out
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/expected/without_oid.out,v
retrieving revision 1.7
diff -c -r1.7 without_oid.out
*** src/test/regress/expected/without_oid.out	14 Mar 2006 22:48:25 -0000	1.7
--- src/test/regress/expected/without_oid.out	12 Jun 2006 03:54:15 -0000
***************
*** 48,54 ****
  VACUUM ANALYZE wi;
  VACUUM ANALYZE wo;
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class
   WHERE relname IN ('wi', 'wo');
   ?column? | ?column? 
  ----------+----------
--- 48,54 ----
  VACUUM ANALYZE wi;
  VACUUM ANALYZE wo;
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class JOIN pg_class_nt ON (pg_class_nt.ctid = relntrans)
   WHERE relname IN ('wi', 'wo');
   ?column? | ?column? 
  ----------+----------
Index: src/test/regress/sql/stats.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/stats.sql,v
retrieving revision 1.5
diff -c -r1.5 stats.sql
*** src/test/regress/sql/stats.sql	11 Jan 2006 20:12:43 -0000	1.5
--- src/test/regress/sql/stats.sql	12 Jun 2006 03:54:50 -0000
***************
*** 2,8 ****
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by create_misc) and indexed (by create_index).
  --
  
  -- conditio sine qua non
--- 2,8 ----
  -- Test Statistics Collector
  --
  -- Must be run after tenk2 has been created (by create_table),
! -- populated (by copy and create_misc) and indexed (by create_index).
  --
  
  -- conditio sine qua non
***************
*** 30,43 ****
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
  
  -- End of Stats Test
--- 30,47 ----
  
  -- check effects
  SELECT st.seq_scan >= pr.seq_scan + 1,
!        st.seq_tup_read >= pr.seq_tup_read + nt.reltuples,
         st.idx_scan >= pr.idx_scan + 1,
         st.idx_tup_fetch >= pr.idx_tup_fetch + 1
!   FROM pg_stat_user_tables AS st,
!        pg_class AS cl JOIN pg_class_nt AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
! SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + nt.relpages,
         st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
!   FROM pg_statio_user_tables AS st,
!        pg_class AS cl JOIN pg_class_nt AS nt ON (nt.ctid = cl.relntrans),
!        prevstats AS pr
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
  
  -- End of Stats Test
Index: src/test/regress/sql/type_sanity.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/type_sanity.sql,v
retrieving revision 1.27
diff -c -r1.27 type_sanity.sql
*** src/test/regress/sql/type_sanity.sql	10 Jul 2005 21:14:00 -0000	1.27
--- src/test/regress/sql/type_sanity.sql	5 Jun 2006 02:36:52 -0000
***************
*** 169,175 ****
  
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v');
  
  -- Indexes should have an access method, others not.
  
--- 169,175 ----
  
  SELECT p1.oid, p1.relname
  FROM pg_class as p1
! WHERE p1.relkind NOT IN ('r', 'i', 's', 'S', 'c', 't', 'v', 'n');
  
  -- Indexes should have an access method, others not.
  
Index: src/test/regress/sql/without_oid.sql
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/test/regress/sql/without_oid.sql,v
retrieving revision 1.6
diff -c -r1.6 without_oid.sql
*** src/test/regress/sql/without_oid.sql	19 Feb 2006 00:04:28 -0000	1.6
--- src/test/regress/sql/without_oid.sql	12 Jun 2006 03:55:11 -0000
***************
*** 45,51 ****
  VACUUM ANALYZE wo;
  
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class
   WHERE relname IN ('wi', 'wo');
  
  DROP TABLE wi;
--- 45,51 ----
  VACUUM ANALYZE wo;
  
  SELECT min(relpages) < max(relpages), min(reltuples) - max(reltuples)
!   FROM pg_class JOIN pg_class_nt ON (pg_class_nt.ctid = relntrans)
   WHERE relname IN ('wi', 'wo');
  
  DROP TABLE wi;

#12

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Alvaro Herrera (#1)

Re: Non-transactional pg_class, try 2

On Sun, 2006-06-11 at 17:53 -0400, Alvaro Herrera wrote:

Here I repost the patch to implement non-transactional catalogs, the
first of which is pg_ntclass, intended to hold the non-transactional
info about pg_class (reltuples, relpages).

Would it be possible to get a summary of what this new feature gives us?
I'm trying to follow the implementation but the why of it seems to have
been buried in the detail.

Will a user be able to update reltuples and relpages manually?

Thanks.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#13

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Simon Riggs (#12)

Re: [PATCHES] Non-transactional pg_class, try 2

[ moving to -hackers to get some more eyeballs on the question ]

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 2006-06-11 at 17:53 -0400, Alvaro Herrera wrote:

Here I repost the patch to implement non-transactional catalogs, the
first of which is pg_ntclass, intended to hold the non-transactional
info about pg_class (reltuples, relpages).

Will a user be able to update reltuples and relpages manually?

No, which is a tad annoying now that you mention it. I'm not sure that
there's any very good reason for users to want to do that, though. Once
or twice I've hacked those fields manually to set up test cases for the
planner, which is why I'd be annoyed to lose the ability --- but does it
really matter to users? (Especially in view of the fact that the
planner no longer trusts relpages anyway.)

It does seem like rather a lot of mechanism and overhead though,
especially in view of Alvaro's worries about the non-cacheability of
pg_class_nt rows. I wonder whether we shouldn't take two steps back
and rethink.

The main thing we are trying to accomplish here is to decouple
transactional and nontransactional updates to a pg_class row.
Is there another way to do that? Do we need complete decoupling?
It strikes me that the only case where we absolutely must not lose a
nontransactional update is where we are un-freezing a frozen rel.
If we could guarantee that un-freezing happens before any transactional
update within a particular transaction, then maybe we could have that.
Manual updates to pg_class seem like they'd risk breaking such a
guarantee, but maybe there's a way around that. Personally I'd be
willing to live with commands that try to modify a frozen rel erroring
out if they see the current pg_class row is uncommitted.

regards, tom lane

#14

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Tom Lane (#13)

Re: [PATCHES] Non-transactional pg_class, try 2

On Mon, 2006-06-12 at 19:15 -0400, Tom Lane wrote:

[ moving to -hackers to get some more eyeballs on the question ]

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 2006-06-11 at 17:53 -0400, Alvaro Herrera wrote:

Here I repost the patch to implement non-transactional catalogs, the
first of which is pg_ntclass, intended to hold the non-transactional
info about pg_class (reltuples, relpages).

Will a user be able to update reltuples and relpages manually?

No, which is a tad annoying now that you mention it. I'm not sure that
there's any very good reason for users to want to do that, though. Once
or twice I've hacked those fields manually to set up test cases for the
planner, which is why I'd be annoyed to lose the ability --- but does it
really matter to users? (Especially in view of the fact that the
planner no longer trusts relpages anyway.)

No need to have an SQL route. A special function call would suffice.

I'd like to be able to set up a test database that has the statistics
copied from the live system. A schema only pg_dump with mods is all I
need, but it sounds like we're moving away from that. We can then
perform various what-ifs on the design.

Elsewhere, it has been discussed that we might hold the number of blocks
in a relation in shared memory. Does that idea now fall down, or is it
complementary to this? i.e. would we replace ANALYZE's relpages with an
accurate relpages for the planner?

It does seem like rather a lot of mechanism and overhead though,
especially in view of Alvaro's worries about the non-cacheability of
pg_class_nt rows. I wonder whether we shouldn't take two steps back
and rethink.

Review, yes. Could still be the best way.

The main thing we are trying to accomplish here is to decouple
transactional and nontransactional updates to a pg_class row.

With the goal being avoiding table bloat??

Is there another way to do that? Do we need complete decoupling?

It strikes me that the only case where we absolutely must not lose a
nontransactional update is where we are un-freezing a frozen rel.

Not sure why you'd want to do that, assuming I've understood you.

For me, freezing is last step before writing to WORM media, so there is
never an unfreeze step.

If we could guarantee that un-freezing happens before any transactional
update within a particular transaction, then maybe we could have that.
Manual updates to pg_class seem like they'd risk breaking such a
guarantee, but maybe there's a way around that. Personally I'd be
willing to live with commands that try to modify a frozen rel erroring
out if they see the current pg_class row is uncommitted.

Sounds OK. It's a major state change after all.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#15

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Simon Riggs (#14)

Re: [PATCHES] Non-transactional pg_class, try 2

Simon Riggs <simon@2ndquadrant.com> writes:

Elsewhere, it has been discussed that we might hold the number of blocks
in a relation in shared memory. Does that idea now fall down, or is it
complementary to this?

It's been the case for some time that the planner uses
RelationGetNumberOfBlocks() to determine true rel size. The only reason
relpages is still stored at all is that it's used to approximate true
number of tuples via
true_ntuples = (reltuples/relpages) * true_npages
ie, assuming that the tuple density is still what it was at the last
VACUUM or ANALYZE. So you can't fool the system with a totally made-up
relation size anyway. (This too is moderately annoying for planner
testing, but it seems the only way to get the planner to react when a
table's been filled without an immediate vacuum/analyze.)

The only point of tracking rel size in shared memory would be to avoid
the costs of lseek() kernel calls in RelationGetNumberOfBlocks.

The main thing we are trying to accomplish here is to decouple
transactional and nontransactional updates to a pg_class row.

With the goal being avoiding table bloat??

No, with the goal being correctness. If you have a freeze/unfreeze
mechanism then unfreezing a relation is an action that must NOT be
rolled back if your transaction (or any other one for that matter) later
aborts. The tuples you put into it meanwhile need to be vacuumed anyway.
So you can't mark it unfrozen in an uncommitted pg_class entry that
might never become committed.

For me, freezing is last step before writing to WORM media, so there is
never an unfreeze step.

That is not what Alvaro is after. Nor anyone else here. I have not
heard anyone mention WORM media for Postgres in *years*.

It strikes me though that automatic UNFREEZE isn't necessarily the
requirement. What if VACUUM FREEZE causes the table to become
effectively read-only, and you need an explicit UNFREEZE command to
put it back into a read-write state? Then UNFREEZE could be a
transactional operation, and most of these issues go away. The case
where this doesn't work conveniently is copying a frozen database
(viz template0), but maybe biting the bullet and finding a way to do
prep work in a freshly made database is the answer for that. We've
certainly seen plenty of other possible uses for post-CREATE processing
in a new database.

Another reason for not doing unfreeze automatically is that as the patch
stands, any database user can force unfreezing of any table, whether he
has any access rights on it or not (because the LockTable will happen
before we check access rights, I believe). This is probably Not Good.
Ideally I think FREEZE/UNFREEZE would be owner-permission-required.

regards, tom lane

#16

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Tom Lane (#15)

Re: [PATCHES] Non-transactional pg_class, try 2

On Tue, 2006-06-13 at 10:02 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Elsewhere, it has been discussed that we might hold the number of blocks
in a relation in shared memory. Does that idea now fall down, or is it
complementary to this?

It's been the case for some time that the planner uses
RelationGetNumberOfBlocks() to determine true rel size. The only reason
relpages is still stored at all is that it's used to approximate true
number of tuples via
true_ntuples = (reltuples/relpages) * true_npages
ie, assuming that the tuple density is still what it was at the last
VACUUM or ANALYZE. So you can't fool the system with a totally made-up
relation size anyway. (This too is moderately annoying for planner
testing, but it seems the only way to get the planner to react when a
table's been filled without an immediate vacuum/analyze.)

The only point of tracking rel size in shared memory would be to avoid
the costs of lseek() kernel calls in RelationGetNumberOfBlocks.

Yes, understood. With the second point to allow them to be separately
set for PGSQL developer testing of optimizer, and application dev
testing of SQL and/or what/if scenarios.

The main thing we are trying to accomplish here is to decouple
transactional and nontransactional updates to a pg_class row.

With the goal being avoiding table bloat??

No, with the goal being correctness. If you have a freeze/unfreeze
mechanism then unfreezing a relation is an action that must NOT be
rolled back if your transaction (or any other one for that matter) later
aborts. The tuples you put into it meanwhile need to be vacuumed anyway.
So you can't mark it unfrozen in an uncommitted pg_class entry that
might never become committed.

For me, freezing is last step before writing to WORM media, so there is
never an unfreeze step.

That is not what Alvaro is after. Nor anyone else here.

So what is unfreeze for again?

I have not
heard anyone mention WORM media for Postgres in *years*.

Oh? Big requirements for archive these days, much more so than before.
This will allow years of data in a seamless on-line/near-line
partitioned table set. Lots of people want that: .gov, .mil, .com

More modern equivalent: a MAID archive system for WORO data

It strikes me though that automatic UNFREEZE isn't necessarily the
requirement. What if VACUUM FREEZE causes the table to become
effectively read-only, and you need an explicit UNFREEZE command to
put it back into a read-write state? Then UNFREEZE could be a
transactional operation, and most of these issues go away.

That works for me. Very much preferred.

The case
where this doesn't work conveniently is copying a frozen database
(viz template0), but maybe biting the bullet and finding a way to do
prep work in a freshly made database is the answer for that. We've
certainly seen plenty of other possible uses for post-CREATE processing
in a new database.

Another reason for not doing unfreeze automatically is that as the patch
stands, any database user can force unfreezing of any table, whether he
has any access rights on it or not (because the LockTable will happen
before we check access rights, I believe). This is probably Not Good.
Ideally I think FREEZE/UNFREEZE would be owner-permission-required.

Seems like a plan.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#17

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Simon Riggs (#16)

Re: [PATCHES] Non-transactional pg_class, try 2

Ok, let's step back to discuss this again. Sorry for the length -- this
is a description of the problem I'm trying to solve, the issues I found,
and how I tried to solve them.

The relminxid Patch
===================

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables. What I'm after is removing the requirement that all
databases must be vacuumed wholly every 2 billion transactions.

Now, why do we need to vacuum whole databases at a time?

The Transaction Id Counter
==========================

We know that the Xid counter is weird; it cycles, for starters, and also
there are special values at the "start" of the cycle that are lesser
than all other values (BootstrapXid, FrozenXid). The idea here is to
allow the counter to wrap around and old tuples not be affected, i.e.,
appear like they were committed in some distant past.

So we use the special Xid values to mark special stuff, like tuples
created by the bootstrap processing (which are always known to be good)
or tuples in template databases that are not connectable ("frozen"
databases). We also use FrozenXid to mark tuples that are very old,
i.e. were committed a long time ago and never deleted. Any such tuple
is unaffected by the status of the Xid counter.

It should be clear that we must ensure that after a suitable amount of
"time" (measured in advancement of the Xid counter) has passed, we
should change the old Xids in tuples to the special FrozenXid value.
The requirement for whole-database vacuuming is there because we
need to ensure that this is done in all the tables in the database.

We keep track of a "minimum Xid", call it minxid. The Xid generator
refuses to assign a new Xid counter if this minxid is too far in the
past, because we'd risk causing Xid-wraparound data loss if we did; the
Xid comparison semantics would start behaving funny, and some tuples
that appeared to be alive not many transactions ago now suddenly appear
dead. Clearly, it's important that before we advance this minxid we
ensure that all tables in the database have been under the process of
changing all regular Xids into FrozenXid.

Currently the only way to ensure that all tables have gone through this
process is processing them in a single VACUUM pass. Skip even one
table, and you can forget about advancing the minxid. Even if the
skipped table was vacuumed in the transaction just before this one.
Even if the table is fully frozen, i.e., all tables on it are marked
with FrozenXid. Even if the table is empty.

Tracking minxid Per Table
=========================

So, my idea is to track this minxid per table. To do this, I added a
column to pg_class called relminxid. The minimum of it across a
database is used to determine each database's minimum, datminxid. The
minimum of all databases is used to advance the global minimum Xid
counter.

So, if a table has 3 tuples whose Xmins are 42, 512 and FrozenXid, the
relminxid is 42. If we keep track of all these religiously during
vacuum, we know exactly what is the minxid we should apply to this
particular table.

It is obvious that vacuuming one table can set the minimum for that
table. So when the vacuuming is done, we can recalculate the database
minimum; and using the minima of all databases, we can advance the
global minimum Xid counter and truncate pg_clog. We can do this on each
single-table vacuum -- so, no more need for database-wide vacuuming.

If a table is empty, or all tuples on it are frozen, then we must mark
the table with relminxid = RecentXmin. This is because there could be
an open transaction that writes a new tuple to the table after the
vacuum is finished. A newly created table must also be created with
relminxid = RecentXid. Because of this, we never mark a table with
relminxid = FrozenXid.

Template Databases
==================

Up to this point everything is relatively simple. Here is where the
strange problems appear. The main issue is template databases.

Why are template databases special? Because they are never vacuumed.
More generally, we assume that every database that is marked as
"datallowconn = false" is fully frozen, i.e. all tables on it are
frozen. Autovacuum skips them. VACUUM ignores them. The minxid
calculations ignore them. They are fully frozen so they don't matter
and they don't harm anybody.

That's fine and dandy until you realize what happens when you freeze a
database, let a couple billion transactions pass, and then create a
database using that as a template (or just "reallow connections" to a
database). Because all the tables were frozen 2 billion transaction
ago, they are marked with an old relminxid, so as soon as you vacuum any
table, the minxid computations went to hell, and we have a DoS
condition.

So, we have to do something to cope with frozen databases. I see two
ways:

1. Remove the special case, i.e., process frozen databases in VACUUM
like every other database.
This is the easiest, because no extra logic is needed. Just make
sure they are vacuumed in time. The only problem would be that we'd
need to uselessly vacuum tables that we know are frozen, from time to
time. But then, those tables are probably small, so what's the
problem with that?

2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that? As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.

Marking a Table Frozen
======================

Marking a table frozen is simple as setting relminxid = FrozenXid for a
table. As explained above, this cannot be done in a regular postmaster
environment, because a concurrent transaction could be doing nasty stuff
to a table. So we can do it only in a standalone backend.

On the other hand, a "frozen" table must be marked with relminxid =
a-regular-Xid as soon as a transaction writes some tuples on it. Note
that this "unfreezing" must take place even if the offending transaction
is aborted, because the Xid is written in the table nevertheless and
thus it would be incorrect to lose the unfreezing.

This is how pg_class_nt came into existence -- it would be a place where
information about a table would be stored and not subject to the rolling
back of the transaction that wrote it. So if you find that a table is
frozen, you write an unfreezing into its pg_class_nt tuple, and that's
it.

Nice optimization: if we detect that a table is fully frozen, then
VACUUM is a no-op (not VACUUM FULL), because by definition there are no
tuples to remove.

Another optimization: if we are sure that unfreezing works, we can even
mark a table as frozen in a postmaster environment, as long as we take
an ExclusiveLock on the table. Thus we know that the vacuum is the sole
transaction concurrently accessing the table; and if another transaction
comes about and writes something after we're finished, it'll correctly
unfreeze the table and all is well.

Where are the problems in this approach?

1. Performance. We'll need to keep a cache of pg_class_nt tuples. This
cache must be independent of the current relcache, because the relcache
is properly transactional while the pg_class_nt cache must not be.

2. The current implementation puts the unfreezing in LockRelation. This
is a problem, because any user can cause a LockRelation on any table,
even if the user does not have access to that table.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#18

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Simon Riggs (#16)

Re: [PATCHES] Non-transactional pg_class, try 2

[Resending: apparently the previous one to the list was eaten by spam
filters or something. Changing SMTP relay again ... ]

Ok, let's step back to discuss this again. Sorry for the length -- this
is a description of the problem I'm trying to solve, the issues I found,
and how I tried to solve them.

The relminxid Patch
===================

Now, why do we need to vacuum whole databases at a time?

The Transaction Id Counter
==========================

Tracking minxid Per Table
=========================

Template Databases
==================

Up to this point everything is relatively simple. Here is where the
strange problems appear. The main issue is template databases.

So, we have to do something to cope with frozen databases. I see two
ways:

Marking a Table Frozen
======================

Nice optimization: if we detect that a table is fully frozen, then
VACUUM is a no-op (not VACUUM FULL), because by definition there are no
tuples to remove.

Where are the problems in this approach?

2. The current implementation puts the unfreezing in LockRelation. This
is a problem, because any user can cause a LockRelation on any table,
even if the user does not have access to that table.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Alvaro Herrera (#17)

Re: [PATCHES] Non-transactional pg_class, try 2

On Mon, 2006-06-26 at 13:58 -0400, Alvaro Herrera wrote:

Ok, let's step back to discuss this again. Sorry for the length -- this
is a description of the problem I'm trying to solve, the issues I found,
and how I tried to solve them.

Thanks. This is good.

The relminxid Patch
===================

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables.

OK, but I am... but I'm happy to not to confuse the discussion.

Now, why do we need to vacuum whole databases at a time?

So, we have to do something to cope with frozen databases. I see two
ways:

1. Remove the special case, i.e., process frozen databases in VACUUM
like every other database.
This is the easiest, because no extra logic is needed. Just make
sure they are vacuumed in time. The only problem would be that we'd
need to uselessly vacuum tables that we know are frozen, from time to
time. But then, those tables are probably small, so what's the
problem with that?

2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that? As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.

At this stage you talk about databases, yet below we switch to
discussing tables. Not sure why we switched from one to the other.

Marking a Table Frozen
======================

Marking a table frozen is simple as setting relminxid = FrozenXid for a
table. As explained above, this cannot be done in a regular postmaster
environment, because a concurrent transaction could be doing nasty stuff
to a table. So we can do it only in a standalone backend.

Surely we just lock the table? No concurrent transactions?

On the other hand, a "frozen" table must be marked with relminxid =
a-regular-Xid as soon as a transaction writes some tuples on it. Note
that this "unfreezing" must take place even if the offending transaction
is aborted, because the Xid is written in the table nevertheless and
thus it would be incorrect to lose the unfreezing.

This is how pg_class_nt came into existence -- it would be a place where
information about a table would be stored and not subject to the rolling
back of the transaction that wrote it. So if you find that a table is
frozen, you write an unfreezing into its pg_class_nt tuple, and that's
it.

Nice optimization: if we detect that a table is fully frozen, then
VACUUM is a no-op (not VACUUM FULL), because by definition there are no
tuples to remove.

Yes please, but we don't need it anymore do we? Guess we need it for
backwards compatibility? VACUUM still needs to vacuum every table.

Another optimization: if we are sure that unfreezing works, we can even
mark a table as frozen in a postmaster environment, as long as we take
an ExclusiveLock on the table. Thus we know that the vacuum is the sole
transaction concurrently accessing the table; and if another transaction
comes about and writes something after we're finished, it'll correctly
unfreeze the table and all is well.

Why not just have a command to FREEZE and UNFREEZE an object? It can
hold an ExclusiveLock, avoiding all issues. Presumably FREEZE and
UNFREEZE are rare commands?

Where are the problems in this approach?

1. Performance. We'll need to keep a cache of pg_class_nt tuples. This
cache must be independent of the current relcache, because the relcache
is properly transactional while the pg_class_nt cache must not be.

2. The current implementation puts the unfreezing in LockRelation. This
is a problem, because any user can cause a LockRelation on any table,
even if the user does not have access to that table.

That last bit just sounds horrible to me. But thinking about it: how
come any user can lock a relation they shouldn't even be allowed to know
exists? Possibly OT.

I can see other reasons for having pg_class_nt, so having table info
cached in shared memory does make sense to me (yet not being part of the
strict definitions of the relcache).

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#20

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Simon Riggs (#19)

Re: [PATCHES] Non-transactional pg_class, try 2

Simon Riggs wrote:

On Mon, 2006-06-26 at 13:58 -0400, Alvaro Herrera wrote:

The relminxid Patch
===================

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables.

OK, but I am... but I'm happy to not to confuse the discussion.

Ok :-) I think I put a note about this but removed it while
restructuring the text so it would be clearer. The note is that while I
don't care about read-only stuff in this proposal, it may be that
read-only tables may come as a "side effect of implementing this. But I
agree we should not make the discussion more complex than it already is.

2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that? As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.

At this stage you talk about databases, yet below we switch to
discussing tables. Not sure why we switched from one to the other.

Sorry, I forgot one step. To mark a database frozen, we must make sure
that all tables within that database are frozen as well. So the first
step to freezing a database is freezing all its tables.

Marking a Table Frozen
======================

Marking a table frozen is simple as setting relminxid = FrozenXid for a
table. As explained above, this cannot be done in a regular postmaster
environment, because a concurrent transaction could be doing nasty stuff
to a table. So we can do it only in a standalone backend.

Surely we just lock the table? No concurrent transactions?

No, because a transaction can have been started previously and yet not
hold any lock on the table, and write on the table after the vacuum
finishes. Or write on an earlier page of the table, after the vacuuming
already processed it. But here it comes one of the "nice points" below,
which was that if we acquire a suitable exclusive lock on the table, we
_can_ mark it frozen. Of course, this cannot be done by plain vacuum,
because we want the table to be still accesible by other transactions.
This is where VACUUM FREEZE comes in -- it does the same processing as
lazy vacuum, except that it locks the table exclusively and marks it
with FrozenXid.

Nice optimization: if we detect that a table is fully frozen, then
VACUUM is a no-op (not VACUUM FULL), because by definition there are no
tuples to remove.

Yes please, but we don't need it anymore do we? Guess we need it for
backwards compatibility? VACUUM still needs to vacuum every table.

Sorry, I don't understand what you mean here. We don't need what
anymore?

Another optimization: if we are sure that unfreezing works, we can even
mark a table as frozen in a postmaster environment, as long as we take
an ExclusiveLock on the table. Thus we know that the vacuum is the sole
transaction concurrently accessing the table; and if another transaction
comes about and writes something after we're finished, it'll correctly
unfreeze the table and all is well.

Why not just have a command to FREEZE and UNFREEZE an object? It can
hold an ExclusiveLock, avoiding all issues. Presumably FREEZE and
UNFREEZE are rare commands?

Ok, if I'm following you here, your point is that FREEZE'ing a table
sets the relminxid to FrozenXid, and UNFREEZE removes that; and also, in
between, no one can write to the table?

This seems to make sense. However, I'm not very sure about the
FREEZE'ing operation, because we need to make sure the table is really
frozen. So we either scan it, or we make sure something else already
scanned it; to me what makes the most sense is having a VACUUM option
that would do the freezing (and a separate command to do the
unfreezing).

Where are the problems in this approach?

2. The current implementation puts the unfreezing in LockRelation. This
is a problem, because any user can cause a LockRelation on any table,
even if the user does not have access to that table.

That last bit just sounds horrible to me. But thinking about it: how
come any user can lock a relation they shouldn't even be allowed to know
exists? Possibly OT.

Hmm, I guess there must be several commands that open the relation and
lock it, and then check permissions. I haven't checked the code but you
shouldn't check permissions before acquiring some kind of lock, and we
shouldn't be upgrading locks either.

I can see other reasons for having pg_class_nt, so having table info
cached in shared memory does make sense to me (yet not being part of the
strict definitions of the relcache).

Yeah.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#17)

Re: [PATCHES] Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@commandprompt.com> writes:

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables. What I'm after is removing the requirement that all
databases must be vacuumed wholly every 2 billion transactions.

Well, if that's the only goal then I hardly think we need to have a
discussion, because your alternative #1 is *obviously* the winner:

1. Remove the special case, i.e., process frozen databases in VACUUM
like every other database.
This is the easiest, because no extra logic is needed. Just make
sure they are vacuumed in time. The only problem would be that we'd
need to uselessly vacuum tables that we know are frozen, from time to
time. But then, those tables are probably small, so what's the
problem with that?

2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that? As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.

Avoiding a vacuum pass over template0 once every 2 billion transactions
cannot be thought worthy of the amount of complexity and risk entailed
in the nontransactional-catalog thing. Especially since in the normal
case those would be read-only passes (the tuples all being frozen already).

So if you want to bring in the other goals that you're trying to pretend
aren't there, step right up and do it. You have not here made a case
that would convince anyone that we shouldn't just do #1 and be done with
it.

regards, tom lane

#22

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#21)

Re: Non-transactional pg_class, try 2

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables. What I'm after is removing the requirement that all
databases must be vacuumed wholly every 2 billion transactions.

Well, if that's the only goal then I hardly think we need to have a
discussion, because your alternative #1 is *obviously* the winner:

1. Remove the special case, i.e., process frozen databases in VACUUM
like every other database.
This is the easiest, because no extra logic is needed. Just make
sure they are vacuumed in time. The only problem would be that we'd
need to uselessly vacuum tables that we know are frozen, from time to
time. But then, those tables are probably small, so what's the
problem with that?

I'm happy to do at least this for 8.2. We can still try to do the
non-transactional catalog later, either in this release or the next; the
code is almost there, and it'll be easier to discuss/design because
we'll have taken the relminxid stuff out of the way.

So if everyone agrees, I'll do this now. Beware -- this may make you
vacuum databases that you previously weren't vacuuming. (I really doubt
anyone is setting datallowconn=false just to skip vacuuming big
databases, but there are people with strange ideas out there.)

So if you want to bring in the other goals that you're trying to pretend
aren't there, step right up and do it. You have not here made a case
that would convince anyone that we shouldn't just do #1 and be done with
it.

We can do it in a separate discussion.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#23

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Alvaro Herrera (#20)

Re: [PATCHES] Non-transactional pg_class, try 2

On Mon, 2006-06-26 at 16:54 -0400, Alvaro Herrera wrote:

Another optimization: if we are sure that unfreezing works, we can

even

mark a table as frozen in a postmaster environment, as long as we

take

an ExclusiveLock on the table. Thus we know that the vacuum is

the sole

transaction concurrently accessing the table; and if another

transaction

comes about and writes something after we're finished, it'll

correctly

unfreeze the table and all is well.

Why not just have a command to FREEZE and UNFREEZE an object? It can
hold an ExclusiveLock, avoiding all issues. Presumably FREEZE and
UNFREEZE are rare commands?

Ok, if I'm following you here, your point is that FREEZE'ing a table
sets the relminxid to FrozenXid, and UNFREEZE removes that; and also,
in
between, no one can write to the table?

This seems to make sense. However, I'm not very sure about the
FREEZE'ing operation, because we need to make sure the table is really
frozen. So we either scan it, or we make sure something else already
scanned it; to me what makes the most sense is having a VACUUM option
that would do the freezing (and a separate command to do the
unfreezing).

Sounds like we're in step here:

VACUUM FREEZE
-- works at either table or database level
-- takes ExclusiveLock, reads all blocks of a table, freezing rows
-- once complete, all write operations are prevented until...

ALTER TABLE xxx UNFREEZE;
ALTER DATABASE xxx UNFREEZE;
-- takes AccessExclusiveLock, allows writes again

CREATE DATABASE automatically does unfreeze after template db copy

Suggest that we prevent write operations on Frozen tables by revoking
all INSERT, UPDATE or DELETE rights held, then enforcing a check during
GRANT to prevent them being re-enabled. Superusers would need to check
every time. If we dont do this, then we will have two contradictory
states marked in the catalog - privilges saying Yes and freezing saying
No.

Not sure where pg_class_nt comes in here though, even though I think I
still want it.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#24

Zeugswetter Andreas DCP SD

ZeugswetterA@spardat.at

over 19 years ago

In reply to: Alvaro Herrera (#17)

Re: [PATCHES] Non-transactional pg_class, try 2

Very nice explanation, thanks Alvaro.

2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that? As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.

Marking a Table Frozen
======================

Marking a table frozen is simple as setting relminxid =
FrozenXid for a table. As explained above, this cannot be
done in a regular postmaster environment, because a
concurrent transaction could be doing nasty stuff to a table.
So we can do it only in a standalone backend.

Unless you lock the table exclusively during vacuum, that could be done
with
vacuum freeze. I like that more, than changing stuff that is otherwise
completely
frozen/static. (I see you wrote that below)

On the other hand, a "frozen" table must be marked with
relminxid = a-regular-Xid as soon as a transaction writes
some tuples on it. Note that this "unfreezing" must take
place even if the offending transaction is aborted, because
the Xid is written in the table nevertheless and thus it
would be incorrect to lose the unfreezing.

The other idea was to need a special unfreeze command ...

This is how pg_class_nt came into existence -- it would be a
place where information about a table would be stored and not
subject to the rolling back of the transaction that wrote it.

Oh, that puts it in another league, since it must guarantee commit.
I am not sure we can do that. The previous discussion was about
concurrency and data that was not so important like tuple count.

In short:
- I'd start with #1 (no relminxid = FrozenXid) like Tom
suggested
- and then implement FREEZE/UNFREEZE with exclusive locks
like Simon wrote (so it does not need pg_class_nt) and use that
for the templates.

Simon wrote:

Suggest that we prevent write operations on Frozen tables by revoking

all INSERT, UPDATE or DELETE rights held, then enforcing a check during
GRANT to prevent them being re-enabled. Superusers would need to check
every time. If we dont do this, then we will have two contradictory
states marked in the catalog - privilges saying Yes and freezing saying
No.

No, I'd not mess with the permissions and return a different error when
trying to
modify a frozen table. (It would also be complicated to unfreeze after
create database)
We should make it clear, that freezing is no replacement for revoke.

Andreas

#25

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Zeugswetter Andreas DCP SD (#24)

Re: [PATCHES] Non-transactional pg_class, try 2

On Tue, 2006-06-27 at 10:04 +0200, Zeugswetter Andreas DCP SD wrote:

Simon wrote:

Suggest that we prevent write operations on Frozen tables by revoking

all INSERT, UPDATE or DELETE rights held, then enforcing a check during
GRANT to prevent them being re-enabled. Superusers would need to check
every time. If we dont do this, then we will have two contradictory
states marked in the catalog - privilges saying Yes and freezing saying
No.

No, I'd not mess with the permissions and return a different error when
trying to
modify a frozen table. (It would also be complicated to unfreeze after
create database)
We should make it clear, that freezing is no replacement for revoke.

That was with a mind to performance. Checking every INSERT, UPDATE and
DELETE statement to see if they are being done against a frozen table
seems like a waste.

There would still be a specific error message for frozen tables, just on
the GRANT rather than the actual DML statements.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#26

Zeugswetter Andreas DCP SD

ZeugswetterA@spardat.at

over 19 years ago

In reply to: Simon Riggs (#25)

Re: [PATCHES] Non-transactional pg_class, try 2

Suggest that we prevent write operations on Frozen tables by
revoking

all INSERT, UPDATE or DELETE rights held, then enforcing a check
during GRANT to prevent them being re-enabled. Superusers would need

to check every time. If we dont do this, then we will have two
contradictory states marked in the catalog - privilges saying Yes

and

freezing saying No.

No, I'd not mess with the permissions and return a different error
when trying to modify a frozen table. (It would also be complicated

unfreeze after create database) We should make it clear, that

freezing

is no replacement for revoke.

That was with a mind to performance. Checking every INSERT,
UPDATE and DELETE statement to see if they are being done
against a frozen table seems like a waste.

I'd think we would have relminxid in the relcache, so I don't buy the
performance argument :-) (You could still do the actual check in the
same place where the permission is checked)

There would still be a specific error message for frozen
tables, just on the GRANT rather than the actual DML statements.

I'd still prefer to see the error on modify. Those that don't can
revoke.

Andreas

#27

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Zeugswetter Andreas DCP SD (#26)

Re: [PATCHES] Non-transactional pg_class, try 2

"Zeugswetter Andreas DCP SD" <ZeugswetterA@spardat.at> writes:

That was with a mind to performance. Checking every INSERT,
UPDATE and DELETE statement to see if they are being done
against a frozen table seems like a waste.

I'd think we would have relminxid in the relcache, so I don't buy the
performance argument :-)

Me either. Further, auto-revoking permissions loses information.
I think that idea is an ugly kluge.

Anyway, the bottom line here seems to be that we should forget about
pg_class_nt and just keep the info in pg_class; there's not sufficient
justification to build the infrastructure needed for a nontransactional
auxiliary catalog. This implies the following conclusions:

* template0 has to be vacuumed against wraparound, same as any other
database.

* To support frozen tables, "VACUUM FREEZE" and "ALTER TABLE UNFREEZE"
would need to be explicit commands taking ExclusiveLock, and can't be
nested inside transaction blocks either. Automatic unfreeze upon an
updating command isn't possible.

Neither of these are bad enough to justify pg_class_nt --- in fact,
I'd argue that explicit unfreeze is better than automatic anyway.
So it was a cute idea, but its time hasn't come.

regards, tom lane

#28

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Alvaro Herrera (#22)

Re: [HACKERS] Non-transactional pg_class, try 2

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

What I'm after is not freezing for read-only media, nor archive, nor
read-only tables. What I'm after is removing the requirement that all
databases must be vacuumed wholly every 2 billion transactions.

Well, if that's the only goal then I hardly think we need to have a
discussion, because your alternative #1 is *obviously* the winner:

1. Remove the special case, i.e., process frozen databases in VACUUM
like every other database.
This is the easiest, because no extra logic is needed. Just make
sure they are vacuumed in time. The only problem would be that we'd
need to uselessly vacuum tables that we know are frozen, from time to
time. But then, those tables are probably small, so what's the
problem with that?

I'm happy to do at least this for 8.2. We can still try to do the
non-transactional catalog later, either in this release or the next; the
code is almost there, and it'll be easier to discuss/design because
we'll have taken the relminxid stuff out of the way.

I attach a patch pursuant to this idea (lacking doc patches for the
maintenance section.)

This patch has the nasty side effect mentioned above -- people will have
to set template0 as connectable and manually run vacuum on it
periodically, unless they run autovacuum.

A future improvement in this area would be to allow frozen tables and
frozen databases, removing this requirement. But I'm inclined to apply
it as is, in the spirit of incremental improvement. Objectors please
speak up!

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#29

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Alvaro Herrera (#28)

1 attachment(s)

Re: [HACKERS] Non-transactional pg_class, try 2

Alvaro Herrera wrote:

I attach a patch pursuant to this idea (lacking doc patches for the
maintenance section.)

Huh, really attached, sorry :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

relminxid-5.patchtext/plain; charset=us-asciiDownload

Index: doc/src/sgml/catalogs.sgml
===================================================================
RCS file: /home/alvherre/cvs/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.124
diff -c -r2.124 catalogs.sgml
*** doc/src/sgml/catalogs.sgml	3 Jun 2006 02:53:04 -0000	2.124
--- doc/src/sgml/catalogs.sgml	28 Jun 2006 18:02:59 -0000
***************
*** 1633,1638 ****
--- 1633,1662 ----
       </row>
  
       <row>
+       <entry><structfield>relminxid</structfield></entry>
+       <entry><type>xid</type></entry>
+       <entry></entry>
+       <entry>
+        The minimum transaction ID present in all rows in this table.  This
+        value is used to determine the database-global
+        <structname>pg_database</>.<structfield>datminxid</> value.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>relvacuumxid</structfield></entry>
+       <entry><type>xid</type></entry>
+       <entry></entry>
+       <entry>
+        The transaction ID that was used as cleaning point as of the last vacuum
+        operation.  All rows inserted, updated or deleted in this table by
+        transactions whose IDs are below this one have been marked as known good
+        or deleted.  This is used to determine the database-global
+        <structname>pg_database</>.<structfield>datvacuumxid</> value.
+       </entry>
+      </row>
+ 
+      <row>
        <entry><structfield>relacl</structfield></entry>
        <entry><type>aclitem[]</type></entry>
        <entry></entry>
***************
*** 2006,2026 ****
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
!        All rows inserted or deleted by transaction IDs before this one
!        have been marked as known committed or known aborted in this database.
!        This is used to determine when commit-log space can be recycled.
        </entry>
       </row>
  
       <row>
!       <entry><structfield>datfrozenxid</structfield></entry>
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
         All rows inserted by transaction IDs before this one have been
         relabeled with a permanent (<quote>frozen</>) transaction ID in this
!        database.  This is useful to check whether a database must be vacuumed
!        soon to avoid transaction ID wrap-around problems.
        </entry>
       </row>
  
--- 2030,2056 ----
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
!        The transaction ID that was used as cleaning point as of the last vacuum
!        operation.  All rows inserted or deleted by transaction IDs before this one
!        have been marked as known good or deleted.  This
!        is used to determine when commit-log space can be recycled.
!        If InvalidTransactionId, then the minimum is unknown and can be
!        determined by scanning <structname>pg_class</>.<structfield>relvacuumxid</>.
        </entry>
       </row>
  
       <row>
!       <entry><structfield>datminxid</structfield></entry>
        <entry><type>xid</type></entry>
        <entry></entry>
        <entry>
+        The minimum transaction ID present in all tables in this database.
         All rows inserted by transaction IDs before this one have been
         relabeled with a permanent (<quote>frozen</>) transaction ID in this
!        database.  This is useful to check whether a database must be
!        vacuumed soon to avoid transaction ID wrap-around problems.
!        If InvalidTransactionId, then the minimum is unknown and can be
!        determined by scanning <structname>pg_class</>.<structfield>relminxid</>.
        </entry>
       </row>
  
Index: src/backend/access/transam/varsup.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/transam/varsup.c,v
retrieving revision 1.70
diff -c -r1.70 varsup.c
*** src/backend/access/transam/varsup.c	5 Mar 2006 15:58:22 -0000	1.70
--- src/backend/access/transam/varsup.c	28 Jun 2006 18:02:59 -0000
***************
*** 168,178 ****
  
  /*
   * Determine the last safe XID to allocate given the currently oldest
!  * datfrozenxid (ie, the oldest XID that might exist in any database
   * of our cluster).
   */
  void
! SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
  					  Name oldest_datname)
  {
  	TransactionId xidWarnLimit;
--- 168,178 ----
  
  /*
   * Determine the last safe XID to allocate given the currently oldest
!  * datminxid (ie, the oldest XID that might exist in any database
   * of our cluster).
   */
  void
! SetTransactionIdLimit(TransactionId oldest_datminxid,
  					  Name oldest_datname)
  {
  	TransactionId xidWarnLimit;
***************
*** 180,195 ****
  	TransactionId xidWrapLimit;
  	TransactionId curXid;
  
! 	Assert(TransactionIdIsValid(oldest_datfrozenxid));
  
  	/*
  	 * The place where we actually get into deep trouble is halfway around
! 	 * from the oldest potentially-existing XID.  (This calculation is
! 	 * probably off by one or two counts, because the special XIDs reduce the
! 	 * size of the loop a little bit.  But we throw in plenty of slop below,
! 	 * so it doesn't matter.)
  	 */
! 	xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
  	if (xidWrapLimit < FirstNormalTransactionId)
  		xidWrapLimit += FirstNormalTransactionId;
  
--- 180,195 ----
  	TransactionId xidWrapLimit;
  	TransactionId curXid;
  
! 	Assert(TransactionIdIsValid(oldest_datminxid));
  
  	/*
  	 * The place where we actually get into deep trouble is halfway around
! 	 * from the oldest existing XID.  (This calculation is probably off by one
! 	 * or two counts, because the special XIDs reduce the size of the loop a
! 	 * little bit.  But we throw in plenty of slop below, so it doesn't
! 	 * matter.)
  	 */
! 	xidWrapLimit = oldest_datminxid + (MaxTransactionId >> 1);
  	if (xidWrapLimit < FirstNormalTransactionId)
  		xidWrapLimit += FirstNormalTransactionId;
  
Index: src/backend/catalog/heap.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/catalog/heap.c,v
retrieving revision 1.299
diff -c -r1.299 heap.c
*** src/backend/catalog/heap.c	10 May 2006 23:18:39 -0000	1.299
--- src/backend/catalog/heap.c	28 Jun 2006 22:04:56 -0000
***************
*** 568,573 ****
--- 568,602 ----
  	 */
  	new_rel_reltup = new_rel_desc->rd_rel;
  
+ 	/* Initialize relminxid and relvacuumxid */
+ 	if (relkind == RELKIND_RELATION ||
+ 		relkind == RELKIND_TOASTVALUE)
+ 	{
+ 		/*
+ 		 * Only real tables have Xids stored in them; initialize our known
+ 		 * value to the minimum Xid that could put tuples in the new table.
+ 		 */
+ 		if (!IsBootstrapProcessingMode())
+ 		{
+ 			new_rel_reltup->relminxid = RecentXmin;
+ 			new_rel_reltup->relvacuumxid = RecentXmin;
+ 		}
+ 		else
+ 		{
+ 			new_rel_reltup->relminxid = FirstNormalTransactionId;
+ 			new_rel_reltup->relvacuumxid = FirstNormalTransactionId;
+ 		}
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Other relations will not have Xids in them, so set the initial value
+ 		 * to InvalidTransactionId.
+ 		 */
+ 		new_rel_reltup->relminxid = InvalidTransactionId;
+ 		new_rel_reltup->relvacuumxid = InvalidTransactionId;
+ 	}
+ 
  	switch (relkind)
  	{
  		case RELKIND_RELATION:
Index: src/backend/commands/analyze.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.93
diff -c -r1.93 analyze.c
*** src/backend/commands/analyze.c	23 Mar 2006 00:19:28 -0000	1.93
--- src/backend/commands/analyze.c	28 Jun 2006 18:02:59 -0000
***************
*** 424,431 ****
  	{
  		vac_update_relstats(RelationGetRelid(onerel),
  							RelationGetNumberOfBlocks(onerel),
! 							totalrows,
! 							hasindex);
  		for (ind = 0; ind < nindexes; ind++)
  		{
  			AnlIndexData *thisdata = &indexdata[ind];
--- 424,432 ----
  	{
  		vac_update_relstats(RelationGetRelid(onerel),
  							RelationGetNumberOfBlocks(onerel),
! 							totalrows, hasindex,
! 							InvalidTransactionId, InvalidTransactionId);
! 
  		for (ind = 0; ind < nindexes; ind++)
  		{
  			AnlIndexData *thisdata = &indexdata[ind];
***************
*** 434,441 ****
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
  			vac_update_relstats(RelationGetRelid(Irel[ind]),
  								RelationGetNumberOfBlocks(Irel[ind]),
! 								totalindexrows,
! 								false);
  		}
  
  		/* report results to the stats collector, too */
--- 435,442 ----
  			totalindexrows = ceil(thisdata->tupleFract * totalrows);
  			vac_update_relstats(RelationGetRelid(Irel[ind]),
  								RelationGetNumberOfBlocks(Irel[ind]),
! 								totalindexrows, false,
! 								InvalidTransactionId, InvalidTransactionId);
  		}
  
  		/* report results to the stats collector, too */
Index: src/backend/commands/dbcommands.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.181
diff -c -r1.181 dbcommands.c
*** src/backend/commands/dbcommands.c	4 May 2006 16:07:29 -0000	1.181
--- src/backend/commands/dbcommands.c	28 Jun 2006 18:24:18 -0000
***************
*** 55,61 ****
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbFrozenXidP,
  			Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
--- 55,61 ----
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbMinXidP,
  			Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
***************
*** 76,82 ****
  	bool		src_allowconn;
  	Oid			src_lastsysoid;
  	TransactionId src_vacuumxid;
! 	TransactionId src_frozenxid;
  	Oid			src_deftablespace;
  	volatile Oid dst_deftablespace;
  	Relation	pg_database_rel;
--- 76,82 ----
  	bool		src_allowconn;
  	Oid			src_lastsysoid;
  	TransactionId src_vacuumxid;
! 	TransactionId src_minxid;
  	Oid			src_deftablespace;
  	volatile Oid dst_deftablespace;
  	Relation	pg_database_rel;
***************
*** 228,234 ****
  	if (!get_db_info(dbtemplate, ShareLock,
  					 &src_dboid, &src_owner, &src_encoding,
  					 &src_istemplate, &src_allowconn, &src_lastsysoid,
! 					 &src_vacuumxid, &src_frozenxid, &src_deftablespace))
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_DATABASE),
  				 errmsg("template database \"%s\" does not exist",
--- 228,234 ----
  	if (!get_db_info(dbtemplate, ShareLock,
  					 &src_dboid, &src_owner, &src_encoding,
  					 &src_istemplate, &src_allowconn, &src_lastsysoid,
! 					 &src_vacuumxid, &src_minxid, &src_deftablespace))
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_DATABASE),
  				 errmsg("template database \"%s\" does not exist",
***************
*** 327,342 ****
  	}
  
  	/*
- 	 * Normally we mark the new database with the same datvacuumxid and
- 	 * datfrozenxid as the source.	However, if the source is not allowing
- 	 * connections then we assume it is fully frozen, and we can set the
- 	 * current transaction ID as the xid limits.  This avoids immediately
- 	 * starting to generate warnings after cloning template0.
- 	 */
- 	if (!src_allowconn)
- 		src_vacuumxid = src_frozenxid = GetCurrentTransactionId();
- 
- 	/*
  	 * Check for db name conflict.	This is just to give a more friendly
  	 * error message than "unique index violation".  There's a race condition
  	 * but we're willing to accept the less friendly message in that case.
--- 327,332 ----
***************
*** 367,373 ****
  	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
  	new_record[Anum_pg_database_datlastsysoid - 1] = ObjectIdGetDatum(src_lastsysoid);
  	new_record[Anum_pg_database_datvacuumxid - 1] = TransactionIdGetDatum(src_vacuumxid);
! 	new_record[Anum_pg_database_datfrozenxid - 1] = TransactionIdGetDatum(src_frozenxid);
  	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
  
  	/*
--- 357,363 ----
  	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
  	new_record[Anum_pg_database_datlastsysoid - 1] = ObjectIdGetDatum(src_lastsysoid);
  	new_record[Anum_pg_database_datvacuumxid - 1] = TransactionIdGetDatum(src_vacuumxid);
! 	new_record[Anum_pg_database_datminxid - 1] = TransactionIdGetDatum(src_minxid);
  	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
  
  	/*
***************
*** 1066,1072 ****
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbFrozenXidP,
  			Oid *dbTablespace)
  {
  	bool		result = false;
--- 1056,1062 ----
  			Oid *dbIdP, Oid *ownerIdP,
  			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
  			Oid *dbLastSysOidP,
! 			TransactionId *dbVacuumXidP, TransactionId *dbMinXidP,
  			Oid *dbTablespace)
  {
  	bool		result = false;
***************
*** 1155,1163 ****
  				/* limit of vacuumed XIDs */
  				if (dbVacuumXidP)
  					*dbVacuumXidP = dbform->datvacuumxid;
! 				/* limit of frozen XIDs */
! 				if (dbFrozenXidP)
! 					*dbFrozenXidP = dbform->datfrozenxid;
  				/* default tablespace for this database */
  				if (dbTablespace)
  					*dbTablespace = dbform->dattablespace;
--- 1145,1153 ----
  				/* limit of vacuumed XIDs */
  				if (dbVacuumXidP)
  					*dbVacuumXidP = dbform->datvacuumxid;
! 				/* limit of min XIDs */
! 				if (dbMinXidP)
! 					*dbMinXidP = dbform->datminxid;
  				/* default tablespace for this database */
  				if (dbTablespace)
  					*dbTablespace = dbform->dattablespace;
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.330
diff -c -r1.330 vacuum.c
*** src/backend/commands/vacuum.c	10 May 2006 23:18:39 -0000	1.330
--- src/backend/commands/vacuum.c	28 Jun 2006 23:28:37 -0000
***************
*** 25,30 ****
--- 25,31 ----
  #include "access/clog.h"
  #include "access/genam.h"
  #include "access/heapam.h"
+ #include "access/multixact.h"
  #include "access/subtrans.h"
  #include "access/xlog.h"
  #include "catalog/catalog.h"
***************
*** 127,132 ****
--- 128,134 ----
  	Size		min_tlen;
  	Size		max_tlen;
  	bool		hasindex;
+ 	TransactionId minxid;	/* Minimum Xid present anywhere on table */
  	/* vtlinks array for tuple chain following - sorted by new_tid */
  	int			num_vtlinks;
  	VTupleLink	vtlinks;
***************
*** 194,218 ****
  
  static int	elevel = -1;
  
- static TransactionId OldestXmin;
- static TransactionId FreezeLimit;
- 
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
  			 const char *stmttype);
! static void vac_update_dbstats(Oid dbid,
! 				   TransactionId vacuumXID,
! 				   TransactionId frozenXID);
! static void vac_truncate_clog(TransactionId vacuumXID,
! 				  TransactionId frozenXID);
! static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages);
  static void repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel);
  static void move_chain_tuple(Relation rel,
  				 Buffer old_buf, Page old_page, HeapTuple old_tup,
  				 Buffer dst_buf, Page dst_page, VacPage dst_vacpage,
--- 196,217 ----
  
  static int	elevel = -1;
  
  
  /* non-export function prototypes */
  static List *get_rel_oids(List *relids, const RangeVar *vacrel,
  			 const char *stmttype);
! static void vac_update_dbminxid(Oid dbid,
! 					TransactionId *minxid,
! 					TransactionId *vacuumxid);
! static void vac_truncate_clog(TransactionId myminxid, TransactionId myvacxid);
! static void vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind);
  static void full_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
  static void scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages,
! 		  TransactionId FreezeLimit, TransactionId OldestXmin);
  static void repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel, TransactionId OldestXmin);
  static void move_chain_tuple(Relation rel,
  				 Buffer old_buf, Page old_page, HeapTuple old_tup,
  				 Buffer dst_buf, Page dst_page, VacPage dst_vacpage,
***************
*** 268,275 ****
  vacuum(VacuumStmt *vacstmt, List *relids)
  {
  	const char *stmttype = vacstmt->vacuum ? "VACUUM" : "ANALYZE";
- 	TransactionId initialOldestXmin = InvalidTransactionId;
- 	TransactionId initialFreezeLimit = InvalidTransactionId;
  	volatile MemoryContext anl_context = NULL;
  	volatile bool all_rels,
  				in_outer_xact,
--- 267,272 ----
***************
*** 352,383 ****
  	 */
  	relations = get_rel_oids(relids, vacstmt->relation, stmttype);
  
- 	if (vacstmt->vacuum && all_rels)
- 	{
- 		/*
- 		 * It's a database-wide VACUUM.
- 		 *
- 		 * Compute the initially applicable OldestXmin and FreezeLimit XIDs,
- 		 * so that we can record these values at the end of the VACUUM. Note
- 		 * that individual tables may well be processed with newer values, but
- 		 * we can guarantee that no (non-shared) relations are processed with
- 		 * older ones.
- 		 *
- 		 * It is okay to record non-shared values in pg_database, even though
- 		 * we may vacuum shared relations with older cutoffs, because only the
- 		 * minimum of the values present in pg_database matters.  We can be
- 		 * sure that shared relations have at some time been vacuumed with
- 		 * cutoffs no worse than the global minimum; for, if there is a
- 		 * backend in some other DB with xmin = OLDXMIN that's determining the
- 		 * cutoff with which we vacuum shared relations, it is not possible
- 		 * for that database to have a cutoff newer than OLDXMIN recorded in
- 		 * pg_database.
- 		 */
- 		vacuum_set_xid_limits(vacstmt, false,
- 							  &initialOldestXmin,
- 							  &initialFreezeLimit);
- 	}
- 
  	/*
  	 * Decide whether we need to start/commit our own transactions.
  	 *
--- 349,354 ----
***************
*** 445,454 ****
  			Oid			relid = lfirst_oid(cur);
  
  			if (vacstmt->vacuum)
! 			{
! 				if (!vacuum_rel(relid, vacstmt, RELKIND_RELATION))
! 					all_rels = false;	/* forget about updating dbstats */
! 			}
  			if (vacstmt->analyze)
  			{
  				MemoryContext old_context = NULL;
--- 416,423 ----
  			Oid			relid = lfirst_oid(cur);
  
  			if (vacstmt->vacuum)
! 				vacuum_rel(relid, vacstmt, RELKIND_RELATION);
! 
  			if (vacstmt->analyze)
  			{
  				MemoryContext old_context = NULL;
***************
*** 524,529 ****
--- 493,501 ----
  
  	if (vacstmt->vacuum)
  	{
+ 		TransactionId	minxid,
+ 						vacuumxid;
+ 
  		/*
  		 * If it was a database-wide VACUUM, print FSM usage statistics (we
  		 * don't make you be superuser to see these).
***************
*** 531,547 ****
  		if (all_rels)
  			PrintFreeSpaceMapStatistics(elevel);
  
! 		/*
! 		 * If we completed a database-wide VACUUM without skipping any
! 		 * relations, update the database's pg_database row with info about
! 		 * the transaction IDs used, and try to truncate pg_clog.
! 		 */
! 		if (all_rels)
! 		{
! 			vac_update_dbstats(MyDatabaseId,
! 							   initialOldestXmin, initialFreezeLimit);
! 			vac_truncate_clog(initialOldestXmin, initialFreezeLimit);
! 		}
  	}
  
  	/*
--- 503,513 ----
  		if (all_rels)
  			PrintFreeSpaceMapStatistics(elevel);
  
! 		/* Update pg_database.datminxid and datvacuumxid */
! 		vac_update_dbminxid(MyDatabaseId, &minxid, &vacuumxid);
! 
! 		/* Try to truncate pg_clog. */
! 		vac_truncate_clog(minxid, vacuumxid);
  	}
  
  	/*
***************
*** 687,693 ****
   */
  void
  vac_update_relstats(Oid relid, BlockNumber num_pages, double num_tuples,
! 					bool hasindex)
  {
  	Relation	rd;
  	HeapTuple	ctup;
--- 653,660 ----
   */
  void
  vac_update_relstats(Oid relid, BlockNumber num_pages, double num_tuples,
! 					bool hasindex, TransactionId minxid,
! 					TransactionId vacuumxid)
  {
  	Relation	rd;
  	HeapTuple	ctup;
***************
*** 735,740 ****
--- 702,717 ----
  			dirty = true;
  		}
  	}
+ 	if (TransactionIdIsValid(minxid) && pgcform->relminxid != minxid)
+ 	{
+ 		pgcform->relminxid = minxid;
+ 		dirty = true;
+ 	}
+ 	if (TransactionIdIsValid(vacuumxid) && pgcform->relvacuumxid != vacuumxid)
+ 	{
+ 		pgcform->relvacuumxid = vacuumxid;
+ 		dirty = true;
+ 	}
  
  	/*
  	 * If anything changed, write out the tuple
***************
*** 747,756 ****
  
  
  /*
!  *	vac_update_dbstats() -- update statistics for one database
   *
!  *		Update the whole-database statistics that are kept in its pg_database
!  *		row, and the flat-file copy of pg_database.
   *
   *		We violate transaction semantics here by overwriting the database's
   *		existing pg_database tuple with the new values.  This is reasonably
--- 724,736 ----
  
  
  /*
!  *	vac_update_dbminxid() -- update the minimum Xid present in one database
   *
!  * 		Update pg_database's datminxid and datvacuumxid, and the flat-file copy
!  * 		of it.  datminxid is updated to the minimum of all relminxid found in
!  * 		pg_class.  datvacuumxid is updated to the minimum of all relvacuumxid
!  * 		found in pg_class.  The values are also returned in minxid and
!  * 		vacuumxid, respectively.
   *
   *		We violate transaction semantics here by overwriting the database's
   *		existing pg_database tuple with the new values.  This is reasonably
***************
*** 758,775 ****
   *		commits.  As with vac_update_relstats, this avoids leaving dead tuples
   *		behind after a VACUUM.
   *
!  *		This routine is shared by full and lazy VACUUM.  Note that it is only
!  *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_update_dbstats(Oid dbid,
! 				   TransactionId vacuumXID,
! 				   TransactionId frozenXID)
  {
- 	Relation	relation;
  	HeapTuple	tuple;
  	Form_pg_database dbform;
  
  	relation = heap_open(DatabaseRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
--- 738,804 ----
   *		commits.  As with vac_update_relstats, this avoids leaving dead tuples
   *		behind after a VACUUM.
   *
!  *		This routine is shared by full and lazy VACUUM.
   */
  static void
! vac_update_dbminxid(Oid dbid, TransactionId *minxid, TransactionId *vacuumxid)
  {
  	HeapTuple	tuple;
  	Form_pg_database dbform;
+ 	Relation	relation;
+ 	SysScanDesc	scan;
+ 	HeapTuple	classTup;
+ 	TransactionId	newMinXid = InvalidTransactionId;
+ 	TransactionId	newVacXid = InvalidTransactionId;
+ 	bool		dirty = false;
+ 
+ 	/* 
+ 	 * We must seqscan pg_class to find the minimum Xid, because there
+ 	 * is no index that can help us here.
+ 	 */
+ 	relation = heap_open(RelationRelationId, AccessShareLock);
+ 
+ 	scan = systable_beginscan(relation, InvalidOid, false,
+ 							  SnapshotNow, 0, NULL);
+ 
+ 	while ((classTup = systable_getnext(scan)) != NULL)
+ 	{
+ 		Form_pg_class classForm;
+ 
+ 		classForm = (Form_pg_class) GETSTRUCT(classTup);
+ 
+ 		/*
+ 		 * Only consider heap and TOAST tables (anything else should have
+ 		 * InvalidTransactionId in both fields anyway.)
+ 		 */
+ 		if (classForm->relkind != RELKIND_RELATION &&
+ 			classForm->relkind != RELKIND_TOASTVALUE)
+ 			continue;
+ 
+ 		Assert(TransactionIdIsNormal(classForm->relminxid));
+ 		Assert(TransactionIdIsNormal(classForm->relvacuumxid));
+ 
+ 		/*
+ 		 * Compute the minimum relminxid in all the tables in the database.
+ 		 */
+ 		if ((!TransactionIdIsValid(newMinXid) ||
+ 			 TransactionIdPrecedes(classForm->relminxid, newMinXid)))
+ 			newMinXid = classForm->relminxid;
+ 
+ 		/* ditto, for relvacuumxid */
+ 		if ((!TransactionIdIsValid(newVacXid) ||
+ 			 TransactionIdPrecedes(classForm->relvacuumxid, newVacXid)))
+ 			newVacXid = classForm->relvacuumxid;
+ 	}
  
+ 	/* we're done with pg_class */
+ 	systable_endscan(scan);
+ 	heap_close(relation, AccessShareLock);
+ 
+ 	Assert(TransactionIdIsNormal(newMinXid));
+ 	Assert(TransactionIdIsNormal(newVacXid));
+ 
+ 	/* Now fetch the pg_database tuple we need to update. */
  	relation = heap_open(DatabaseRelationId, RowExclusiveLock);
  
  	/* Fetch a copy of the tuple to scribble on */
***************
*** 780,795 ****
  		elog(ERROR, "could not find tuple for database %u", dbid);
  	dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 	/* overwrite the existing statistics in the tuple */
! 	dbform->datvacuumxid = vacuumXID;
! 	dbform->datfrozenxid = frozenXID;
  
! 	heap_inplace_update(relation, tuple);
  
  	heap_close(relation, RowExclusiveLock);
  
! 	/* Mark the flat-file copy of pg_database for update at commit */
! 	database_file_update_needed();
  }
  
  
--- 809,834 ----
  		elog(ERROR, "could not find tuple for database %u", dbid);
  	dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 	if (TransactionIdPrecedes(dbform->datminxid, newMinXid))
! 	{
! 		dbform->datminxid = newMinXid;
! 		dirty = true;
! 	}
! 	if (TransactionIdPrecedes(dbform->datvacuumxid, newVacXid))
! 	{
! 		dbform->datvacuumxid = newVacXid;
! 		dirty = true;
! 	}
  
! 	if (dirty)
! 		heap_inplace_update(relation, tuple);
  
+ 	heap_freetuple(tuple);
  	heap_close(relation, RowExclusiveLock);
  
! 	/* set return values */
! 	*minxid = newMinXid;
! 	*vacuumxid = newVacXid;
  }
  
  
***************
*** 813,830 ****
   *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_truncate_clog(TransactionId vacuumXID, TransactionId frozenXID)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Relation	relation;
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	int32		age;
  	NameData	oldest_datname;
  	bool		vacuumAlreadyWrapped = false;
! 	bool		frozenAlreadyWrapped = false;
  
! 	/* init oldest_datname to sync with my frozenXID */
  	namestrcpy(&oldest_datname, get_database_name(MyDatabaseId));
  
  	/*
--- 852,873 ----
   *		applied after a database-wide VACUUM operation.
   */
  static void
! vac_truncate_clog(TransactionId myminxid, TransactionId myvacxid)
  {
  	TransactionId myXID = GetCurrentTransactionId();
+ 	TransactionId minXID;
+ 	TransactionId vacuumXID;
  	Relation	relation;
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	int32		age;
  	NameData	oldest_datname;
  	bool		vacuumAlreadyWrapped = false;
! 	bool		minAlreadyWrapped = false;
  
! 	/* Initialize the minimum values. */
! 	minXID = myminxid;
! 	vacuumXID = myvacxid;
  	namestrcpy(&oldest_datname, get_database_name(MyDatabaseId));
  
  	/*
***************
*** 839,865 ****
  	{
  		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 		/* Ignore non-connectable databases (eg, template0) */
! 		/* It's assumed that these have been frozen correctly */
! 		if (!dbform->datallowconn)
! 			continue;
  
! 		if (TransactionIdIsNormal(dbform->datvacuumxid))
  		{
! 			if (TransactionIdPrecedes(myXID, dbform->datvacuumxid))
! 				vacuumAlreadyWrapped = true;
! 			else if (TransactionIdPrecedes(dbform->datvacuumxid, vacuumXID))
! 				vacuumXID = dbform->datvacuumxid;
! 		}
! 		if (TransactionIdIsNormal(dbform->datfrozenxid))
! 		{
! 			if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
! 				frozenAlreadyWrapped = true;
! 			else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
! 			{
! 				frozenXID = dbform->datfrozenxid;
! 				namecpy(&oldest_datname, &dbform->datname);
! 			}
  		}
  	}
  
--- 882,901 ----
  	{
  		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
  
! 		Assert(TransactionIdIsNormal(dbform->datvacuumxid));
! 		Assert(TransactionIdIsNormal(dbform->datminxid));
  
! 		if (TransactionIdPrecedes(myXID, dbform->datvacuumxid))
! 			vacuumAlreadyWrapped = true;
! 		else if (TransactionIdPrecedes(dbform->datvacuumxid, vacuumXID))
! 			vacuumXID = dbform->datvacuumxid;
! 
! 		if (TransactionIdPrecedes(myXID, dbform->datminxid))
! 			minAlreadyWrapped = true;
! 		else if (TransactionIdPrecedes(dbform->datminxid, minXID))
  		{
! 			minXID = dbform->datminxid;
! 			namecpy(&oldest_datname, &dbform->datname);
  		}
  	}
  
***************
*** 886,892 ****
  	 * Do not update varsup.c if we seem to have suffered wraparound already;
  	 * the computed XID might be bogus.
  	 */
! 	if (frozenAlreadyWrapped)
  	{
  		ereport(WARNING,
  				(errmsg("some databases have not been vacuumed in over 1 billion transactions"),
--- 922,928 ----
  	 * Do not update varsup.c if we seem to have suffered wraparound already;
  	 * the computed XID might be bogus.
  	 */
! 	if (minAlreadyWrapped)
  	{
  		ereport(WARNING,
  				(errmsg("some databases have not been vacuumed in over 1 billion transactions"),
***************
*** 895,904 ****
  	}
  
  	/* Update the wrap limit for GetNewTransactionId */
! 	SetTransactionIdLimit(frozenXID, &oldest_datname);
  
  	/* Give warning about impending wraparound problems */
! 	age = (int32) (myXID - frozenXID);
  	if (age > (int32) ((MaxTransactionId >> 3) * 3))
  		ereport(WARNING,
  		   (errmsg("database \"%s\" must be vacuumed within %u transactions",
--- 931,940 ----
  	}
  
  	/* Update the wrap limit for GetNewTransactionId */
! 	SetTransactionIdLimit(minXID, &oldest_datname);
  
  	/* Give warning about impending wraparound problems */
! 	age = (int32) (myXID - minXID);
  	if (age > (int32) ((MaxTransactionId >> 3) * 3))
  		ereport(WARNING,
  		   (errmsg("database \"%s\" must be vacuumed within %u transactions",
***************
*** 920,930 ****
  /*
   *	vacuum_rel() -- vacuum one heap relation
   *
-  *		Returns TRUE if we actually processed the relation (or can ignore it
-  *		for some reason), FALSE if we failed to process it due to permissions
-  *		or other reasons.  (A FALSE result really means that some data
-  *		may have been left unvacuumed, so we can't update XID stats.)
-  *
   *		Doing one heap at a time incurs extra overhead, since we need to
   *		check that the heap exists again just before we vacuum it.	The
   *		reason that we do this is so that vacuuming can be spread across
--- 956,961 ----
***************
*** 933,952 ****
   *
   *		At entry and exit, we are not inside a transaction.
   */
! static bool
  vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
  	LockRelId	onerelid;
  	Oid			toast_relid;
- 	bool		result;
  
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  	/* functions in indexes may want a snapshot set */
  	ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  
  	/*
  	 * Tell the cache replacement strategy that vacuum is causing all
  	 * following IO
--- 964,983 ----
   *
   *		At entry and exit, we are not inside a transaction.
   */
! static void
  vacuum_rel(Oid relid, VacuumStmt *vacstmt, char expected_relkind)
  {
  	LOCKMODE	lmode;
  	Relation	onerel;
  	LockRelId	onerelid;
  	Oid			toast_relid;
  
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  	/* functions in indexes may want a snapshot set */
  	ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
  
+ 
  	/*
  	 * Tell the cache replacement strategy that vacuum is causing all
  	 * following IO
***************
*** 969,975 ****
  	{
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return true;			/* okay 'cause no data there */
  	}
  
  	/*
--- 1000,1006 ----
  	{
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1000,1006 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return false;
  	}
  
  	/*
--- 1031,1037 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1015,1021 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return false;
  	}
  
  	/*
--- 1046,1052 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;
  	}
  
  	/*
***************
*** 1030,1036 ****
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return true;			/* assume no long-lived data in temp tables */
  	}
  
  	/*
--- 1061,1067 ----
  		relation_close(onerel, lmode);
  		StrategyHintVacuum(false);
  		CommitTransactionCommand();
! 		return;			/* assume no long-lived data in temp tables */
  	}
  
  	/*
***************
*** 1059,1066 ****
  	else
  		lazy_vacuum_rel(onerel, vacstmt);
  
- 	result = true;				/* did the vacuum */
- 
  	/* all done with this class, but hold lock until commit */
  	relation_close(onerel, NoLock);
  
--- 1090,1095 ----
***************
*** 1078,1094 ****
  	 * totally unimportant for toast relations.
  	 */
  	if (toast_relid != InvalidOid)
! 	{
! 		if (!vacuum_rel(toast_relid, vacstmt, RELKIND_TOASTVALUE))
! 			result = false;		/* failed to vacuum the TOAST table? */
! 	}
  
  	/*
  	 * Now release the session-level lock on the master table.
  	 */
  	UnlockRelationForSession(&onerelid, lmode);
  
! 	return result;
  }
  
  
--- 1107,1120 ----
  	 * totally unimportant for toast relations.
  	 */
  	if (toast_relid != InvalidOid)
! 		vacuum_rel(toast_relid, vacstmt, RELKIND_TOASTVALUE);
  
  	/*
  	 * Now release the session-level lock on the master table.
  	 */
  	UnlockRelationForSession(&onerelid, lmode);
  
! 	return;
  }
  
  
***************
*** 1120,1125 ****
--- 1146,1153 ----
  	int			nindexes,
  				i;
  	VRelStats  *vacrelstats;
+ 	TransactionId FreezeLimit,
+ 				  OldestXmin;
  
  	vacuum_set_xid_limits(vacstmt, onerel->rd_rel->relisshared,
  						  &OldestXmin, &FreezeLimit);
***************
*** 1132,1140 ****
  	vacrelstats->rel_tuples = 0;
  	vacrelstats->hasindex = false;
  
  	/* scan the heap */
  	vacuum_pages.num_pages = fraged_pages.num_pages = 0;
! 	scan_heap(vacrelstats, onerel, &vacuum_pages, &fraged_pages);
  
  	/* Now open all indexes of the relation */
  	vac_open_indexes(onerel, AccessExclusiveLock, &nindexes, &Irel);
--- 1160,1180 ----
  	vacrelstats->rel_tuples = 0;
  	vacrelstats->hasindex = false;
  
+ 	/*
+ 	 * Set initial minimum Xid, which will be updated if a smaller Xid is found
+ 	 * in the relation by scan_heap.
+ 	 *
+ 	 * We use RecentXmin here (the minimum Xid that belongs to a transaction
+ 	 * that is still open according to our snapshot), because it is the
+ 	 * earliest transaction that could insert new tuples in the table after our
+ 	 * VACUUM is done.
+ 	 */
+ 	vacrelstats->minxid = RecentXmin;
+ 
  	/* scan the heap */
  	vacuum_pages.num_pages = fraged_pages.num_pages = 0;
! 	scan_heap(vacrelstats, onerel, &vacuum_pages, &fraged_pages, FreezeLimit,
! 			  OldestXmin);
  
  	/* Now open all indexes of the relation */
  	vac_open_indexes(onerel, AccessExclusiveLock, &nindexes, &Irel);
***************
*** 1162,1168 ****
  	{
  		/* Try to shrink heap */
  		repair_frag(vacrelstats, onerel, &vacuum_pages, &fraged_pages,
! 					nindexes, Irel);
  		vac_close_indexes(nindexes, Irel, NoLock);
  	}
  	else
--- 1202,1208 ----
  	{
  		/* Try to shrink heap */
  		repair_frag(vacrelstats, onerel, &vacuum_pages, &fraged_pages,
! 					nindexes, Irel, OldestXmin);
  		vac_close_indexes(nindexes, Irel, NoLock);
  	}
  	else
***************
*** 1180,1186 ****
  
  	/* update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(onerel), vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, vacrelstats->hasindex);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
--- 1220,1227 ----
  
  	/* update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(onerel), vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, vacrelstats->hasindex,
! 						vacrelstats->minxid, OldestXmin);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
***************
*** 1196,1209 ****
   *		deleted tuples), constructs fraged_pages (list of pages with free
   *		space that tuples could be moved into), and calculates statistics
   *		on the number of live tuples in the heap.
   */
  static void
  scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages)
  {
  	BlockNumber nblocks,
  				blkno;
- 	HeapTupleData tuple;
  	char	   *relname;
  	VacPage		vacpage;
  	BlockNumber empty_pages,
--- 1237,1253 ----
   *		deleted tuples), constructs fraged_pages (list of pages with free
   *		space that tuples could be moved into), and calculates statistics
   *		on the number of live tuples in the heap.
+  *
+  *		It also updates the minimum Xid found anywhere on the table in
+  *		vacrelstats->minxid, for later storing it in pg_class.relminxid.
   */
  static void
  scan_heap(VRelStats *vacrelstats, Relation onerel,
! 		  VacPageList vacuum_pages, VacPageList fraged_pages,
! 		  TransactionId FreezeLimit, TransactionId OldestXmin)
  {
  	BlockNumber nblocks,
  				blkno;
  	char	   *relname;
  	VacPage		vacpage;
  	BlockNumber empty_pages,
***************
*** 1317,1322 ****
--- 1361,1367 ----
  		{
  			ItemId		itemid = PageGetItemId(page, offnum);
  			bool		tupgone = false;
+ 			HeapTupleData tuple;
  
  			/*
  			 * Collect un-used items too - it's possible to have indexes
***************
*** 1452,1463 ****
--- 1497,1519 ----
  			}
  			else
  			{
+ 				TransactionId	min;
+ 
  				num_tuples += 1;
  				notup = false;
  				if (tuple.t_len < min_tlen)
  					min_tlen = tuple.t_len;
  				if (tuple.t_len > max_tlen)
  					max_tlen = tuple.t_len;
+ 
+ 				/* 
+ 				 * If the tuple is alive, we consider it for the "minxid"
+ 				 * calculations.
+ 				 */
+ 				min = vactuple_get_minxid(&tuple);
+ 				if (TransactionIdIsValid(min) &&
+ 					TransactionIdPrecedes(min, vacrelstats->minxid))
+ 					vacrelstats->minxid = min;
  			}
  		}						/* scan along page */
  
***************
*** 1583,1588 ****
--- 1639,1701 ----
  					   pg_rusage_show(&ru0))));
  }
  
+ /*
+  * vactuple_get_minxid
+  *
+  * Get the minimum relevant Xid for a tuple, not considering FrozenXid.
+  * Return InvalidXid if none (i.e., xmin=FrozenXid, xmax=InvalidXid).
+  * This is for the purpose of calculating pg_class.relminxid for a table
+  * we're vacuuming.
+  */
+ TransactionId
+ vactuple_get_minxid(HeapTuple tuple)
+ {
+ 	TransactionId	min = InvalidTransactionId;
+ 
+ 	/* 
+ 	 * Initialize calculations with Xmin.  NB -- may be FrozenXid and
+ 	 * we don't want that one.
+ 	 */
+ 	if (TransactionIdIsNormal(HeapTupleHeaderGetXmin(tuple->t_data)))
+ 		min = HeapTupleHeaderGetXmin(tuple->t_data);
+ 
+ 	/*
+ 	 * If Xmax is not marked INVALID, we assume it's valid without making
+ 	 * further checks on it --- it must be recently obsoleted or still running,
+ 	 * else HeapTupleSatisfiesVacuum would have deemed it removable.
+ 	 */
+ 	if (!(tuple->t_data->t_infomask | HEAP_XMAX_INVALID))
+ 	{
+ 		TransactionId	 xmax = HeapTupleHeaderGetXmax(tuple->t_data);
+ 
+ 		/* If xmax is a plain Xid, consider it by itself */
+ 		if (!(tuple->t_data->t_infomask | HEAP_XMAX_IS_MULTI))
+ 		{
+ 			if (!TransactionIdIsValid(min) ||
+ 				(TransactionIdIsNormal(xmax) &&
+ 				 TransactionIdPrecedes(xmax, min)))
+ 				min = xmax;
+ 		}
+ 		else
+ 		{
+ 			/* If it's a MultiXactId, consider each of its members */
+ 			TransactionId *members;
+ 			int			nmembers,
+ 						membno;
+ 
+ 			nmembers = GetMultiXactIdMembers(xmax, &members);
+ 
+ 			for (membno = 0; membno < nmembers; membno++)
+ 			{
+ 				if (!TransactionIdIsValid(min) ||
+ 					TransactionIdPrecedes(members[membno], min))
+ 					min = members[membno];
+ 			}
+ 		}
+ 	}
+ 
+ 	return min;
+ }
  
  /*
   *	repair_frag() -- try to repair relation's fragmentation
***************
*** 1597,1603 ****
  static void
  repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Buffer		dst_buffer = InvalidBuffer;
--- 1710,1716 ----
  static void
  repair_frag(VRelStats *vacrelstats, Relation onerel,
  			VacPageList vacuum_pages, VacPageList fraged_pages,
! 			int nindexes, Relation *Irel, TransactionId OldestXmin)
  {
  	TransactionId myXID = GetCurrentTransactionId();
  	Buffer		dst_buffer = InvalidBuffer;
***************
*** 2940,2946 ****
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 3053,3059 ----
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
***************
*** 3009,3015 ****
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 3122,3128 ----
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
  						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.70
diff -c -r1.70 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c	2 May 2006 22:25:10 -0000	1.70
--- src/backend/commands/vacuumlazy.c	28 Jun 2006 21:43:37 -0000
***************
*** 42,47 ****
--- 42,48 ----
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
+ #include "catalog/catalog.h"
  #include "commands/vacuum.h"
  #include "miscadmin.h"
  #include "pgstat.h"
***************
*** 72,77 ****
--- 73,79 ----
  	double		tuples_deleted;
  	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
  	Size		threshold;		/* minimum interesting free space */
+ 	TransactionId minxid;		/* minimum Xid present anywhere in table */
  	/* List of TIDs of tuples we intend to delete */
  	/* NB: this list is ordered by TID address */
  	int			num_dead_tuples;	/* current # of entries */
***************
*** 88,100 ****
  
  static int	elevel = -1;
  
- static TransactionId OldestXmin;
- static TransactionId FreezeLimit;
- 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
  static void lazy_vacuum_index(Relation indrel,
  							  IndexBulkDeleteResult **stats,
--- 90,100 ----
  
  static int	elevel = -1;
  
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes, TransactionId FreezeLimit,
! 			   TransactionId OldestXmin);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
  static void lazy_vacuum_index(Relation indrel,
  							  IndexBulkDeleteResult **stats,
***************
*** 104,112 ****
  							   LVRelStats *vacrelstats);
  static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
  				 int tupindex, LVRelStats *vacrelstats);
! static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
  static BlockNumber count_nondeletable_pages(Relation onerel,
! 						 LVRelStats *vacrelstats);
  static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
  static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
  					   ItemPointer itemptr);
--- 104,113 ----
  							   LVRelStats *vacrelstats);
  static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
  				 int tupindex, LVRelStats *vacrelstats);
! static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats,
! 							   TransactionId OldestXmin);
  static BlockNumber count_nondeletable_pages(Relation onerel,
! 						 LVRelStats *vacrelstats, TransactionId OldestXmin);
  static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
  static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
  					   ItemPointer itemptr);
***************
*** 122,128 ****
   *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
   *
   *		This routine vacuums a single heap, cleans out its indexes, and
!  *		updates its num_pages and num_tuples statistics.
   *
   *		At entry, we have already established a transaction and opened
   *		and locked the relation.
--- 123,130 ----
   *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
   *
   *		This routine vacuums a single heap, cleans out its indexes, and
!  *		updates its relpages and reltuples statistics, as well as the
!  *		relminxid and relvacuumxid information.
   *
   *		At entry, we have already established a transaction and opened
   *		and locked the relation.
***************
*** 135,140 ****
--- 137,144 ----
  	int			nindexes;
  	bool		hasindex;
  	BlockNumber possibly_freeable;
+ 	TransactionId OldestXmin,
+ 				  FreezeLimit;
  
  	if (vacstmt->verbose)
  		elevel = INFO;
***************
*** 150,161 ****
  	/* XXX should we scale it up or down?  Adjust vacuum.c too, if so */
  	vacrelstats->threshold = GetAvgFSMRequestSize(&onerel->rd_node);
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, ShareUpdateExclusiveLock, &nindexes, &Irel);
  	hasindex = (nindexes > 0);
  
  	/* Do the vacuuming */
! 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes);
  
  	/* Done with indexes */
  	vac_close_indexes(nindexes, Irel, NoLock);
--- 154,176 ----
  	/* XXX should we scale it up or down?  Adjust vacuum.c too, if so */
  	vacrelstats->threshold = GetAvgFSMRequestSize(&onerel->rd_node);
  
+ 	/*
+ 	 * Set initial minimum Xid, which will be updated if a smaller Xid is found
+ 	 * in the relation by lazy_scan_heap.
+ 	 *
+ 	 * We use RecentXmin here (the minimum Xid that belongs to a transaction
+ 	 * that is still open according to our snapshot), because it is the
+ 	 * earliest transaction that could concurrently insert new tuples in the
+ 	 * table.
+ 	 */
+ 	vacrelstats->minxid = RecentXmin;
+ 
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, ShareUpdateExclusiveLock, &nindexes, &Irel);
  	hasindex = (nindexes > 0);
  
  	/* Do the vacuuming */
! 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, FreezeLimit, OldestXmin);
  
  	/* Done with indexes */
  	vac_close_indexes(nindexes, Irel, NoLock);
***************
*** 169,184 ****
  	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
  	if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
  		possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)
! 		lazy_truncate_heap(onerel, vacrelstats);
  
  	/* Update shared free space map with final free space info */
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel),
! 						vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples,
! 						hasindex);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
--- 184,198 ----
  	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
  	if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
  		possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)
! 		lazy_truncate_heap(onerel, vacrelstats, OldestXmin);
  
  	/* Update shared free space map with final free space info */
  	lazy_update_fsm(onerel, vacrelstats);
  
  	/* Update statistics in pg_class */
! 	vac_update_relstats(RelationGetRelid(onerel), vacrelstats->rel_pages,
! 						vacrelstats->rel_tuples, hasindex,
! 						vacrelstats->minxid, OldestXmin);
  
  	/* report results to the stats collector, too */
  	pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared,
***************
*** 193,202 ****
   *		and pages with free space, and calculates statistics on the number
   *		of live tuples in the heap.  When done, or when we run low on space
   *		for dead-tuple TIDs, invoke vacuuming of indexes and heap.
   */
  static void
  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes)
  {
  	BlockNumber nblocks,
  				blkno;
--- 207,220 ----
   *		and pages with free space, and calculates statistics on the number
   *		of live tuples in the heap.  When done, or when we run low on space
   *		for dead-tuple TIDs, invoke vacuuming of indexes and heap.
+  *
+  *		It also updates the minimum Xid found anywhere on the table in
+  *		vacrelstats->minxid, for later storing it in pg_class.relminxid.
   */
  static void
  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
! 			   Relation *Irel, int nindexes, TransactionId FreezeLimit,
! 			   TransactionId OldestXmin)
  {
  	BlockNumber nblocks,
  				blkno;
***************
*** 406,413 ****
--- 424,442 ----
  			}
  			else
  			{
+ 				TransactionId min;
+ 
  				num_tuples += 1;
  				hastup = true;
+ 
+ 				/* 
+ 				 * If the tuple is alive, we consider it for the "minxid"
+ 				 * calculations.
+ 				 */
+ 				min = vactuple_get_minxid(&tuple);
+ 				if (TransactionIdIsValid(min) &&
+ 					TransactionIdPrecedes(min, vacrelstats->minxid))
+ 					vacrelstats->minxid = min;
  			}
  		}						/* scan along page */
  
***************
*** 668,676 ****
  
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
! 						stats->num_pages,
! 						stats->num_index_tuples,
! 						false);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
--- 697,704 ----
  
  	/* now update statistics in pg_class */
  	vac_update_relstats(RelationGetRelid(indrel),
! 						stats->num_pages, stats->num_index_tuples,
! 						false, InvalidTransactionId, InvalidTransactionId);
  
  	ereport(elevel,
  			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
***************
*** 691,697 ****
   * lazy_truncate_heap - try to truncate off any empty pages at the end
   */
  static void
! lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
  {
  	BlockNumber old_rel_pages = vacrelstats->rel_pages;
  	BlockNumber new_rel_pages;
--- 719,726 ----
   * lazy_truncate_heap - try to truncate off any empty pages at the end
   */
  static void
! lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats,
! 				   TransactionId OldestXmin)
  {
  	BlockNumber old_rel_pages = vacrelstats->rel_pages;
  	BlockNumber new_rel_pages;
***************
*** 732,738 ****
  	 * because other backends could have added tuples to these pages whilst we
  	 * were vacuuming.
  	 */
! 	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  
  	if (new_rel_pages >= old_rel_pages)
  	{
--- 761,767 ----
  	 * because other backends could have added tuples to these pages whilst we
  	 * were vacuuming.
  	 */
! 	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats, OldestXmin);
  
  	if (new_rel_pages >= old_rel_pages)
  	{
***************
*** 787,793 ****
   * Returns number of nondeletable pages (last nonempty page + 1).
   */
  static BlockNumber
! count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
  {
  	BlockNumber blkno;
  	HeapTupleData tuple;
--- 816,823 ----
   * Returns number of nondeletable pages (last nonempty page + 1).
   */
  static BlockNumber
! count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats,
! 						 TransactionId OldestXmin)
  {
  	BlockNumber blkno;
  	HeapTupleData tuple;
Index: src/backend/libpq/hba.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/libpq/hba.c,v
retrieving revision 1.151
diff -c -r1.151 hba.c
*** src/backend/libpq/hba.c	6 Mar 2006 17:41:43 -0000	1.151
--- src/backend/libpq/hba.c	28 Jun 2006 18:02:59 -0000
***************
*** 1005,1011 ****
   *	dbname: gets database name (must be of size NAMEDATALEN bytes)
   *	dboid: gets database OID
   *	dbtablespace: gets database's default tablespace's OID
!  *	dbfrozenxid: gets database's frozen XID
   *	dbvacuumxid: gets database's vacuum XID
   *
   * This is not much related to the other functions in hba.c, but we put it
--- 1005,1011 ----
   *	dbname: gets database name (must be of size NAMEDATALEN bytes)
   *	dboid: gets database OID
   *	dbtablespace: gets database's default tablespace's OID
!  *	dbminxid: gets database's minimum XID
   *	dbvacuumxid: gets database's vacuum XID
   *
   * This is not much related to the other functions in hba.c, but we put it
***************
*** 1013,1019 ****
   */
  bool
  read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbfrozenxid,
  					  TransactionId *dbvacuumxid)
  {
  	char		buf[MAX_TOKEN];
--- 1013,1019 ----
   */
  bool
  read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbminxid,
  					  TransactionId *dbvacuumxid)
  {
  	char		buf[MAX_TOKEN];
***************
*** 1036,1042 ****
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
! 	*dbfrozenxid = atoxid(buf);
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
--- 1036,1042 ----
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
! 	*dbminxid = atoxid(buf);
  	next_token(fp, buf, sizeof(buf));
  	if (!isdigit((unsigned char) buf[0]))
  		elog(FATAL, "bad data in flat pg_database file");
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.19
diff -c -r1.19 autovacuum.c
*** src/backend/postmaster/autovacuum.c	19 May 2006 15:15:37 -0000	1.19
--- src/backend/postmaster/autovacuum.c	28 Jun 2006 18:02:59 -0000
***************
*** 79,85 ****
  {
  	Oid			oid;
  	char	   *name;
! 	TransactionId frozenxid;
  	TransactionId vacuumxid;
  	PgStat_StatDBEntry *entry;
  	int32		age;
--- 79,85 ----
  {
  	Oid			oid;
  	char	   *name;
! 	TransactionId minxid;
  	TransactionId vacuumxid;
  	PgStat_StatDBEntry *entry;
  	int32		age;
***************
*** 349,355 ****
  	{
  		autovac_dbase *tmp = lfirst(cell);
  		bool		this_whole_db;
! 		int32		freeze_age,
  					vacuum_age;
  
  		/*
--- 349,355 ----
  	{
  		autovac_dbase *tmp = lfirst(cell);
  		bool		this_whole_db;
! 		int32		true_age,
  					vacuum_age;
  
  		/*
***************
*** 362,370 ****
  		 * Unlike vacuum.c, we also look at vacuumxid.	This is so that
  		 * pg_clog can be kept trimmed to a reasonable size.
  		 */
! 		freeze_age = (int32) (nextXid - tmp->frozenxid);
  		vacuum_age = (int32) (nextXid - tmp->vacuumxid);
! 		tmp->age = Max(freeze_age, vacuum_age);
  
  		this_whole_db = (tmp->age >
  						 (int32) ((MaxTransactionId >> 3) * 3 - 100000));
--- 362,370 ----
  		 * Unlike vacuum.c, we also look at vacuumxid.	This is so that
  		 * pg_clog can be kept trimmed to a reasonable size.
  		 */
! 		true_age = (int32) (nextXid - tmp->minxid);
  		vacuum_age = (int32) (nextXid - tmp->vacuumxid);
! 		tmp->age = Max(true_age, vacuum_age);
  
  		this_whole_db = (tmp->age >
  						 (int32) ((MaxTransactionId >> 3) * 3 - 100000));
***************
*** 455,461 ****
  	FILE	   *db_file;
  	Oid			db_id;
  	Oid			db_tablespace;
! 	TransactionId db_frozenxid;
  	TransactionId db_vacuumxid;
  
  	filename = database_getflatfilename();
--- 455,461 ----
  	FILE	   *db_file;
  	Oid			db_id;
  	Oid			db_tablespace;
! 	TransactionId db_minxid;
  	TransactionId db_vacuumxid;
  
  	filename = database_getflatfilename();
***************
*** 466,472 ****
  				 errmsg("could not open file \"%s\": %m", filename)));
  
  	while (read_pg_database_line(db_file, thisname, &db_id,
! 								 &db_tablespace, &db_frozenxid,
  								 &db_vacuumxid))
  	{
  		autovac_dbase *db;
--- 466,472 ----
  				 errmsg("could not open file \"%s\": %m", filename)));
  
  	while (read_pg_database_line(db_file, thisname, &db_id,
! 								 &db_tablespace, &db_minxid,
  								 &db_vacuumxid))
  	{
  		autovac_dbase *db;
***************
*** 475,481 ****
  
  		db->oid = db_id;
  		db->name = pstrdup(thisname);
! 		db->frozenxid = db_frozenxid;
  		db->vacuumxid = db_vacuumxid;
  		/* these get set later: */
  		db->entry = NULL;
--- 475,481 ----
  
  		db->oid = db_id;
  		db->name = pstrdup(thisname);
! 		db->minxid = db_minxid;
  		db->vacuumxid = db_vacuumxid;
  		/* these get set later: */
  		db->entry = NULL;
Index: src/backend/utils/init/flatfiles.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/utils/init/flatfiles.c,v
retrieving revision 1.18
diff -c -r1.18 flatfiles.c
*** src/backend/utils/init/flatfiles.c	4 May 2006 16:07:29 -0000	1.18
--- src/backend/utils/init/flatfiles.c	28 Jun 2006 18:02:59 -0000
***************
*** 163,169 ****
  /*
   * write_database_file: update the flat database file
   *
!  * A side effect is to determine the oldest database's datfrozenxid
   * so we can set or update the XID wrap limit.
   */
  static void
--- 163,169 ----
  /*
   * write_database_file: update the flat database file
   *
!  * A side effect is to determine the oldest database's datminxid
   * so we can set or update the XID wrap limit.
   */
  static void
***************
*** 177,183 ****
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	NameData	oldest_datname;
! 	TransactionId oldest_datfrozenxid = InvalidTransactionId;
  
  	/*
  	 * Create a temporary filename to be renamed later.  This prevents the
--- 177,183 ----
  	HeapScanDesc scan;
  	HeapTuple	tuple;
  	NameData	oldest_datname;
! 	TransactionId oldest_datminxid = InvalidTransactionId;
  
  	/*
  	 * Create a temporary filename to be renamed later.  This prevents the
***************
*** 208,234 ****
  		char	   *datname;
  		Oid			datoid;
  		Oid			dattablespace;
! 		TransactionId datfrozenxid,
  					datvacuumxid;
  
  		datname = NameStr(dbform->datname);
  		datoid = HeapTupleGetOid(tuple);
  		dattablespace = dbform->dattablespace;
! 		datfrozenxid = dbform->datfrozenxid;
  		datvacuumxid = dbform->datvacuumxid;
  
  		/*
! 		 * Identify the oldest datfrozenxid, ignoring databases that are not
  		 * connectable (we assume they are safely frozen).	This must match
  		 * the logic in vac_truncate_clog() in vacuum.c.
  		 */
  		if (dbform->datallowconn &&
! 			TransactionIdIsNormal(datfrozenxid))
  		{
! 			if (oldest_datfrozenxid == InvalidTransactionId ||
! 				TransactionIdPrecedes(datfrozenxid, oldest_datfrozenxid))
  			{
! 				oldest_datfrozenxid = datfrozenxid;
  				namestrcpy(&oldest_datname, datname);
  			}
  		}
--- 208,234 ----
  		char	   *datname;
  		Oid			datoid;
  		Oid			dattablespace;
! 		TransactionId datminxid,
  					datvacuumxid;
  
  		datname = NameStr(dbform->datname);
  		datoid = HeapTupleGetOid(tuple);
  		dattablespace = dbform->dattablespace;
! 		datminxid = dbform->datminxid;
  		datvacuumxid = dbform->datvacuumxid;
  
  		/*
! 		 * Identify the oldest datminxid, ignoring databases that are not
  		 * connectable (we assume they are safely frozen).	This must match
  		 * the logic in vac_truncate_clog() in vacuum.c.
  		 */
  		if (dbform->datallowconn &&
! 			TransactionIdIsNormal(datminxid))
  		{
! 			if (oldest_datminxid == InvalidTransactionId ||
! 				TransactionIdPrecedes(datminxid, oldest_datminxid))
  			{
! 				oldest_datminxid = datminxid;
  				namestrcpy(&oldest_datname, datname);
  			}
  		}
***************
*** 244,257 ****
  		}
  
  		/*
! 		 * The file format is: "dbname" oid tablespace frozenxid vacuumxid
  		 *
  		 * The xids are not needed for backend startup, but are of use to
  		 * autovacuum, and might also be helpful for forensic purposes.
  		 */
  		fputs_quote(datname, fp);
  		fprintf(fp, " %u %u %u %u\n",
! 				datoid, dattablespace, datfrozenxid, datvacuumxid);
  	}
  	heap_endscan(scan);
  
--- 244,257 ----
  		}
  
  		/*
! 		 * The file format is: "dbname" oid tablespace minxid vacuumxid
  		 *
  		 * The xids are not needed for backend startup, but are of use to
  		 * autovacuum, and might also be helpful for forensic purposes.
  		 */
  		fputs_quote(datname, fp);
  		fprintf(fp, " %u %u %u %u\n",
! 				datoid, dattablespace, datminxid, datvacuumxid);
  	}
  	heap_endscan(scan);
  
***************
*** 272,281 ****
  						tempname, filename)));
  
  	/*
! 	 * Set the transaction ID wrap limit using the oldest datfrozenxid
  	 */
! 	if (oldest_datfrozenxid != InvalidTransactionId)
! 		SetTransactionIdLimit(oldest_datfrozenxid, &oldest_datname);
  }
  
  
--- 272,281 ----
  						tempname, filename)));
  
  	/*
! 	 * Set the transaction ID wrap limit using the oldest datminxid
  	 */
! 	if (oldest_datminxid != InvalidTransactionId)
! 		SetTransactionIdLimit(oldest_datminxid, &oldest_datname);
  }
  
  
Index: src/include/access/transam.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/access/transam.h,v
retrieving revision 1.57
diff -c -r1.57 transam.h
*** src/include/access/transam.h	5 Mar 2006 15:58:53 -0000	1.57
--- src/include/access/transam.h	28 Jun 2006 21:49:51 -0000
***************
*** 23,28 ****
--- 23,29 ----
   * always be considered valid.
   *
   * FirstNormalTransactionId is the first "normal" transaction id.
+  * Note: if you need to change it, you must change it in pg_class.h as well.
   * ----------------
   */
  #define InvalidTransactionId		((TransactionId) 0)
***************
*** 123,129 ****
  /* in transam/varsup.c */
  extern TransactionId GetNewTransactionId(bool isSubXact);
  extern TransactionId ReadNewTransactionId(void);
! extern void SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
  					  Name oldest_datname);
  extern Oid	GetNewObjectId(void);
  
--- 124,130 ----
  /* in transam/varsup.c */
  extern TransactionId GetNewTransactionId(bool isSubXact);
  extern TransactionId ReadNewTransactionId(void);
! extern void SetTransactionIdLimit(TransactionId oldest_datminxid,
  					  Name oldest_datname);
  extern Oid	GetNewObjectId(void);
  
Index: src/include/catalog/pg_attribute.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_attribute.h,v
retrieving revision 1.120
diff -c -r1.120 pg_attribute.h
*** src/include/catalog/pg_attribute.h	5 Mar 2006 15:58:54 -0000	1.120
--- src/include/catalog/pg_attribute.h	28 Jun 2006 18:02:59 -0000
***************
*** 404,410 ****
  { 1259, {"relhaspkey"},    16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
  { 1259, {"relhasrules"},   16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
  { 1259, {"relhassubclass"},16, -1,	1, 24, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 25, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
--- 404,412 ----
  { 1259, {"relhaspkey"},    16, -1,	1, 22, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
  { 1259, {"relhasrules"},   16, -1,	1, 23, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
  { 1259, {"relhassubclass"},16, -1,	1, 24, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \
! { 1259, {"relminxid"},     28, -1,	4, 25, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relvacuumxid"},  28, -1,	4, 26, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \
! { 1259, {"relacl"},		 1034, -1, -1, 27, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }
  
  DATA(insert ( 1259 relname			19 -1 NAMEDATALEN	1 0 -1 -1 f p i t f f t 0));
  DATA(insert ( 1259 relnamespace		26 -1 4   2 0 -1 -1 t p i t f f t 0));
***************
*** 430,436 ****
  DATA(insert ( 1259 relhaspkey		16 -1 1  22 0 -1 -1 t p c t f f t 0));
  DATA(insert ( 1259 relhasrules		16 -1 1  23 0 -1 -1 t p c t f f t 0));
  DATA(insert ( 1259 relhassubclass	16 -1 1  24 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 25 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
--- 432,440 ----
  DATA(insert ( 1259 relhaspkey		16 -1 1  22 0 -1 -1 t p c t f f t 0));
  DATA(insert ( 1259 relhasrules		16 -1 1  23 0 -1 -1 t p c t f f t 0));
  DATA(insert ( 1259 relhassubclass	16 -1 1  24 0 -1 -1 t p c t f f t 0));
! DATA(insert ( 1259 relminxid		28 -1 4  25 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relvacuumxid		28 -1 4  26 0 -1 -1 t p i t f f t 0));
! DATA(insert ( 1259 relacl		  1034 -1 -1 27 1 -1 -1 f x i f f f t 0));
  DATA(insert ( 1259 ctid				27 0  6  -1 0 -1 -1 f p s t f f t 0));
  DATA(insert ( 1259 oid				26 0  4  -2 0 -1 -1 t p i t f f t 0));
  DATA(insert ( 1259 xmin				28 0  4  -3 0 -1 -1 t p i t f f t 0));
Index: src/include/catalog/pg_class.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_class.h,v
retrieving revision 1.92
diff -c -r1.92 pg_class.h
*** src/include/catalog/pg_class.h	28 May 2006 02:27:08 -0000	1.92
--- src/include/catalog/pg_class.h	28 Jun 2006 21:49:26 -0000
***************
*** 74,79 ****
--- 74,81 ----
  	bool		relhaspkey;		/* has PRIMARY KEY index */
  	bool		relhasrules;	/* has associated rules */
  	bool		relhassubclass; /* has derived classes */
+ 	TransactionId relminxid;	/* minimum Xid present in table */
+ 	TransactionId relvacuumxid;	/* Xid used as last vacuum OldestXmin */
  
  	/*
  	 * relacl may or may not be present, see note above!
***************
*** 83,89 ****
  
  /* Size of fixed part of pg_class tuples, not counting relacl or padding */
  #define CLASS_TUPLE_SIZE \
! 	 (offsetof(FormData_pg_class,relhassubclass) + sizeof(bool))
  
  /* ----------------
   *		Form_pg_class corresponds to a pointer to a tuple with
--- 85,91 ----
  
  /* Size of fixed part of pg_class tuples, not counting relacl or padding */
  #define CLASS_TUPLE_SIZE \
! 	 (offsetof(FormData_pg_class,relvacuumxid) + sizeof(TransactionId))
  
  /* ----------------
   *		Form_pg_class corresponds to a pointer to a tuple with
***************
*** 103,110 ****
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			24
! #define Natts_pg_class					25
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
--- 105,112 ----
   *		relacl field.  This is a kluge.
   * ----------------
   */
! #define Natts_pg_class_fixed			26
! #define Natts_pg_class					27
  #define Anum_pg_class_relname			1
  #define Anum_pg_class_relnamespace		2
  #define Anum_pg_class_reltype			3
***************
*** 129,135 ****
  #define Anum_pg_class_relhaspkey		22
  #define Anum_pg_class_relhasrules		23
  #define Anum_pg_class_relhassubclass	24
! #define Anum_pg_class_relacl			25
  
  /* ----------------
   *		initial contents of pg_class
--- 131,139 ----
  #define Anum_pg_class_relhaspkey		22
  #define Anum_pg_class_relhasrules		23
  #define Anum_pg_class_relhassubclass	24
! #define Anum_pg_class_relminxid			25
! #define Anum_pg_class_relvacuumxid		26
! #define Anum_pg_class_relacl			27
  
  /* ----------------
   *		initial contents of pg_class
***************
*** 139,151 ****
   * ----------------
   */
  
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 0 0 f f r 23 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 0 0 0 0 f f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 18 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 25 0 0 0 0 0 t f f f _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
--- 143,156 ----
   * ----------------
   */
  
! /* Note: the "3" here stands for FirstNormalTransactionId */
! DATA(insert OID = 1247 (  pg_type		PGNSP 71 PGUID 0 1247 0 0 0 0 0 f f r 23 0 0 0 0 0 t f f f 3 3 _null_ ));
  DESCR("");
! DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 0 0 0 0 f f f f 3 3 _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 18 0 0 0 0 0 t f f f 3 3 _null_ ));
  DESCR("");
! DATA(insert OID = 1259 (  pg_class		PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 27 0 0 0 0 0 t f f f 3 3 _null_ ));
  DESCR("");
  
  #define		  RELKIND_INDEX			  'i'		/* secondary index */
Index: src/include/catalog/pg_database.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/catalog/pg_database.h,v
retrieving revision 1.40
diff -c -r1.40 pg_database.h
*** src/include/catalog/pg_database.h	5 Mar 2006 15:58:54 -0000	1.40
--- src/include/catalog/pg_database.h	28 Jun 2006 21:33:43 -0000
***************
*** 43,49 ****
  	int4		datconnlimit;	/* max connections allowed (-1=no limit) */
  	Oid			datlastsysoid;	/* highest OID to consider a system OID */
  	TransactionId datvacuumxid; /* all XIDs before this are vacuumed */
! 	TransactionId datfrozenxid; /* all XIDs before this are frozen */
  	Oid			dattablespace;	/* default table space for this DB */
  	text		datconfig[1];	/* database-specific GUC (VAR LENGTH) */
  	aclitem		datacl[1];		/* access permissions (VAR LENGTH) */
--- 43,49 ----
  	int4		datconnlimit;	/* max connections allowed (-1=no limit) */
  	Oid			datlastsysoid;	/* highest OID to consider a system OID */
  	TransactionId datvacuumxid; /* all XIDs before this are vacuumed */
! 	TransactionId datminxid;	/* minimum XID present anywhere in the DB */
  	Oid			dattablespace;	/* default table space for this DB */
  	text		datconfig[1];	/* database-specific GUC (VAR LENGTH) */
  	aclitem		datacl[1];		/* access permissions (VAR LENGTH) */
***************
*** 69,75 ****
  #define Anum_pg_database_datconnlimit	6
  #define Anum_pg_database_datlastsysoid	7
  #define Anum_pg_database_datvacuumxid	8
! #define Anum_pg_database_datfrozenxid	9
  #define Anum_pg_database_dattablespace	10
  #define Anum_pg_database_datconfig		11
  #define Anum_pg_database_datacl			12
--- 69,75 ----
  #define Anum_pg_database_datconnlimit	6
  #define Anum_pg_database_datlastsysoid	7
  #define Anum_pg_database_datvacuumxid	8
! #define Anum_pg_database_datminxid		9
  #define Anum_pg_database_dattablespace	10
  #define Anum_pg_database_datconfig		11
  #define Anum_pg_database_datacl			12
Index: src/include/commands/vacuum.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/commands/vacuum.h,v
retrieving revision 1.63
diff -c -r1.63 vacuum.h
*** src/include/commands/vacuum.h	5 Mar 2006 15:58:55 -0000	1.63
--- src/include/commands/vacuum.h	28 Jun 2006 21:28:14 -0000
***************
*** 114,128 ****
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Oid relid,
! 					BlockNumber num_pages,
! 					double num_tuples,
! 					bool hasindex);
  extern void vacuum_set_xid_limits(VacuumStmt *vacstmt, bool sharedRel,
  					  TransactionId *oldestXmin,
  					  TransactionId *freezeLimit);
  extern bool vac_is_partial_index(Relation indrel);
  extern void vacuum_delay_point(void);
  
  /* in commands/vacuumlazy.c */
  extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
--- 114,128 ----
  extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
  				 int *nindexes, Relation **Irel);
  extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
! extern void vac_update_relstats(Oid relid, BlockNumber num_pages,
! 					double num_tuples, bool hasindex,
! 					TransactionId minxid, TransactionId vacuumxid);
  extern void vacuum_set_xid_limits(VacuumStmt *vacstmt, bool sharedRel,
  					  TransactionId *oldestXmin,
  					  TransactionId *freezeLimit);
  extern bool vac_is_partial_index(Relation indrel);
  extern void vacuum_delay_point(void);
+ extern TransactionId vactuple_get_minxid(HeapTuple tuple);
  
  /* in commands/vacuumlazy.c */
  extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt);
Index: src/include/libpq/hba.h
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/include/libpq/hba.h,v
retrieving revision 1.42
diff -c -r1.42 hba.h
*** src/include/libpq/hba.h	6 Mar 2006 17:41:44 -0000	1.42
--- src/include/libpq/hba.h	28 Jun 2006 18:02:59 -0000
***************
*** 40,46 ****
  extern int	hba_getauthmethod(hbaPort *port);
  extern int	authident(hbaPort *port);
  extern bool read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbfrozenxid,
  					  TransactionId *dbvacuumxid);
  
  #endif   /* HBA_H */
--- 40,46 ----
  extern int	hba_getauthmethod(hbaPort *port);
  extern int	authident(hbaPort *port);
  extern bool read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
! 					  Oid *dbtablespace, TransactionId *dbminxid,
  					  TransactionId *dbvacuumxid);
  
  #endif   /* HBA_H */

#30

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#28)

Re: [HACKERS] Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@commandprompt.com> writes:

This patch has the nasty side effect mentioned above -- people will have
to set template0 as connectable and manually run vacuum on it
periodically, unless they run autovacuum.

That's pretty messy --- making template0 connectable is a great way to
shoot yourself in the foot. What I'd propose instead is that even if
autovacuum is nominally off, the system forces autovacuum when it
notices that a non-connectable database is approaching wraparound.
In this mode the autovac daemon would be restricted to processing
non-connectable databases. (The subtext here is that autovac is the
wave of the future anyway.)

In fact, maybe we should just force an autovac cycle for any DB that
appears to be approaching wraparound, rather than waiting for the
shutdown-before-wraparound code to kick in. Getting into that state
amounts to whacking DBAs upside the head for being stupid, which
doesn't really win us any friends ...

Implementation-wise, I'd propose that we add another PostmasterSignal
event type whereby a backend could request the postmaster to launch
an autovac process even if autovac is off. The end-of-VACUUM code that
scans pg_database.datminxid would issue the signal if it finds anything
seriously old.

regards, tom lane

#31

Simon Riggs

simon@2ndquadrant.com

over 19 years ago

In reply to: Tom Lane (#30)

Re: [HACKERS] Non-transactional pg_class, try 2

On Wed, 2006-06-28 at 20:08 -0400, Tom Lane wrote:

In fact, maybe we should just force an autovac cycle for any DB that
appears to be approaching wraparound, rather than waiting for the
shutdown-before-wraparound code to kick in. Getting into that state
amounts to whacking DBAs upside the head for being stupid, which
doesn't really win us any friends ...

Yes, please can we have the auto autovacuum cut in rather than the
wraparound message? I'd rather have them complain that we did this, than
complain that we didn't.

Normally, I wouldn't support automatically starting admin tasks without
thr sysadmins knowledge.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#32

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#30)

Re: [HACKERS] Non-transactional pg_class, try 2

Tom Lane wrote:

In fact, maybe we should just force an autovac cycle for any DB that
appears to be approaching wraparound, rather than waiting for the
shutdown-before-wraparound code to kick in. Getting into that state
amounts to whacking DBAs upside the head for being stupid, which
doesn't really win us any friends ...

Sounds fine. How far back should we allow databases to go? If we wait
too long, pg_clog won't be truncated regularly, so I think we should do
it rather early than wait until it's close to wraparound.

Implementation-wise, I'd propose that we add another PostmasterSignal
event type whereby a backend could request the postmaster to launch
an autovac process even if autovac is off. The end-of-VACUUM code that
scans pg_database.datminxid would issue the signal if it finds anything
seriously old.

I think we could give autovac a "reason for being started", which would
normally be the periodic stuff, but if the postmaster got the signal
from a backend, pass that info to autovac and it could use a different
database selection algorithm -- say, just select the oldest database,
even if it's not in danger of Xid wraparound. So this would allow early
database-wide vacuums for non-connectable databases (template0), and
normal per-table vacuuming for database that are in actual use.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#33

Jim C. Nasby

jnasby@pervasive.com

over 19 years ago

In reply to: Simon Riggs (#31)

Re: [HACKERS] Non-transactional pg_class, try 2

On Thu, Jun 29, 2006 at 09:39:27AM +0100, Simon Riggs wrote:

On Wed, 2006-06-28 at 20:08 -0400, Tom Lane wrote:

In fact, maybe we should just force an autovac cycle for any DB that
appears to be approaching wraparound, rather than waiting for the
shutdown-before-wraparound code to kick in. Getting into that state
amounts to whacking DBAs upside the head for being stupid, which
doesn't really win us any friends ...

Yes, please can we have the auto autovacuum cut in rather than the
wraparound message? I'd rather have them complain that we did this, than
complain that we didn't.

Normally, I wouldn't support automatically starting admin tasks without
thr sysadmins knowledge.

I think it'd be good to put a big, fat WARNING in the log if we fire up
an autovac to avoid an XID wrap, since it's an indication that the
vacuuming scheme that's in place probably isn't good enough.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#34

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Jim C. Nasby (#33)

Re: [HACKERS] Non-transactional pg_class, try 2

"Jim C. Nasby" <jnasby@pervasive.com> writes:

I think it'd be good to put a big, fat WARNING in the log if we fire up
an autovac to avoid an XID wrap, since it's an indication that the
vacuuming scheme that's in place probably isn't good enough.

No, for nonconnectable databases it'd be expected behavior (given the
proposed patch). Warn away if the db is connectable, though ...

regards, tom lane

#35

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#32)

Re: [HACKERS] Non-transactional pg_class, try 2

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think we could give autovac a "reason for being started", which would
normally be the periodic stuff, but if the postmaster got the signal
from a backend, pass that info to autovac and it could use a different
database selection algorithm -- say, just select the oldest database,
even if it's not in danger of Xid wraparound.

I don't think it's worth the trouble to provide such a signaling
mechanism (it'd be a bit of a PITA to make it work in both fork and
EXEC_BACKEND cases). ISTM it's sufficient for autovac to look at the
GUC state and notice whether it's nominally enabled or not. If not,
it should only consider anti-wraparound vacuum operations.

If the signal is given just when VACUUM sees a datvacuumxid value that's
old enough to cause autovac to decide it had better go prevent
wraparound, then this will work without any additional data transfer.
We could negotiate exactly how old DBs need to be to cause this; if you
want to make it less than a billion xacts so that we can keep pg_clog
cut down to size, that's fine with me.

regards, tom lane

#36

Alvaro Herrera

alvherre@commandprompt.com

over 19 years ago

In reply to: Tom Lane (#35)

Re: [HACKERS] Non-transactional pg_class, try 2

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think we could give autovac a "reason for being started", which would
normally be the periodic stuff, but if the postmaster got the signal
from a backend, pass that info to autovac and it could use a different
database selection algorithm -- say, just select the oldest database,
even if it's not in danger of Xid wraparound.

I don't think it's worth the trouble to provide such a signaling
mechanism (it'd be a bit of a PITA to make it work in both fork and
EXEC_BACKEND cases). ISTM it's sufficient for autovac to look at the
GUC state and notice whether it's nominally enabled or not. If not,
it should only consider anti-wraparound vacuum operations.

If the signal is given just when VACUUM sees a datvacuumxid value that's
old enough to cause autovac to decide it had better go prevent
wraparound, then this will work without any additional data transfer.
We could negotiate exactly how old DBs need to be to cause this; if you
want to make it less than a billion xacts so that we can keep pg_clog
cut down to size, that's fine with me.

Committed. I didn't do anything but the simplest stuff; autovacuum
checks whether it's enabled in GUC, and if it doesn't then it just picks
the oldest database and issues a database-wide vacuum. The
vac_truncate_clog routine will send a signal at the same time as the
WARNING about nearing Xid wraparound would be emitted.

One funny thing I noticed is that if there is more than one database
needing db-wide vacuum, vacuum will send a signal; autovacuum will
process one database; and autovacuum will send a signal as well when
it's done because of the other databases. But autovacuum will get the
second signal and do nothing, because of the code to check for frequent
autovacuum starts. This kind of suprised me at first but it's really to
be expected. I considered disabling that timing code in the case of
getting the signal, but I don't think it's worth it.

One important improvement we'd may want to do is changing vacuum so that
it only calls vac_truncate_clog once if invoked by autovacuum (currently
it'll be called every time a table is vacuumed). Also I think we could
change the OldestXmin stuff so that it only takes database-local
transactions into account for non-shared tables.

But in the spirit of incremental improvement I think we're much better
now.

Happy sprinting to everyone,

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.