WIP partial replication patch

Started by Boszormenyi Zoltanover 15 years ago9 messages
#1Boszormenyi Zoltan
zb@cybertec.at
1 attachment(s)

Hi,

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

CREATE REPLICA CLASS classname
[ EXCLUDING RELATION ( relname [ , ... ] ) ]
[ EXCLUDING DATABASE ( dbname [ , ... ] ) ]

ALTER REPLICA CLASS classname
[ { INCLUDING | EXCLUDING } RELATION ( relname [ , ... ] ) ]
[ { INCLUDING | EXCLUDING } DATABASE ( dbname [ , ... ] ) ]

The use case is to have a secondary server where read-only access
is allowed but (maybe for space reasons) some tables and databases
are excluded from the replication. The standby server keeps those tables
at the state of the last full backup but no further modification is done
to them.

The current patch adds two new global system tables, pg_replica and
pg_replicaitem and three new indexes to maintain the classes and their
contents.

The startup process in standby mode connects to a new database called
"replication" which is created at initdb time. This is needed because
transaction context is needed for accessing the syscache for the new tables.

There is a little nasty detail with the patch as it stands. The RelFileNode
triplet is currently treated as if it carried the relation Oid, but it's
not actually
true, the RelFileNode contains the relfilenode ID. Initially, without table
rewriting DDL, the oid equals relfilenode, which was enough for a proof of
concept patch. I will need to extend the relmapper so it can carry more than
one "database-local" mapping info, so the filter can work in all database
at once. To be able to do this, all databases' pg_class should be read
initially
and re-read during relmapper cache invalidation. As a sidenode, this work
may serve as a basis for full cross-database relation access, too.

Best regards,
Zolt�n B�sz�rm�nyi

Attachments:

partial-rep-ctxdiff.patchtext/x-patch; name=partial-rep-ctxdiff.patchDownload
diff -dcrpN pgsql/src/backend/access/heap/heapam.c pgsql-partial/src/backend/access/heap/heapam.c
*** pgsql/src/backend/access/heap/heapam.c	2010-07-29 21:08:05.000000000 +0200
--- pgsql-partial/src/backend/access/heap/heapam.c	2010-08-12 20:12:45.000000000 +0200
***************
*** 51,57 ****
--- 51,61 ----
  #include "access/xact.h"
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
+ #include "catalog/indexing.h"
  #include "catalog/namespace.h"
+ #include "catalog/pg_replica.h"
+ #include "catalog/pg_replicaitem.h"
+ #include "catalog/pg_replica_fn.h"
  #include "miscadmin.h"
  #include "pgstat.h"
  #include "storage/bufmgr.h"
*************** heap_xlog_newpage(XLogRecPtr lsn, XLogRe
*** 4264,4269 ****
--- 4268,4284 ----
  	 * Note: the NEWPAGE log record is used for both heaps and indexes, so do
  	 * not do anything that assumes we are touching a heap.
  	 */
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->node.dbNode,
+ 									xlrec->node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	buffer = XLogReadBufferExtended(xlrec->node, xlrec->forknum, xlrec->blkno,
  									RBM_ZERO);
  	Assert(BufferIsValid(buffer));
*************** heap_xlog_delete(XLogRecPtr lsn, XLogRec
*** 4298,4303 ****
--- 4313,4337 ----
  	HeapTupleHeader htup;
  	BlockNumber blkno;
  
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		/* Don't allow deleting from pg_replica or pg_replicaitem or their indexes */
+ 		if (xlrec->target.node.relNode == ReplicaRelationId ||
+ 			xlrec->target.node.relNode == ReplicaOidIndexId ||
+ 			xlrec->target.node.relNode == ReplicaClassnameIndexId ||
+ 			xlrec->target.node.relNode == ReplicaItemRelationId ||
+ 			xlrec->target.node.relNode == ReplicaItemIndexId)
+ 			return;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->target.node.dbNode,
+ 									xlrec->target.node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	blkno = ItemPointerGetBlockNumber(&(xlrec->target.tid));
  
  	/*
*************** heap_xlog_insert(XLogRecPtr lsn, XLogRec
*** 4376,4381 ****
--- 4410,4426 ----
  	Size		freespace;
  	BlockNumber blkno;
  
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->target.node.dbNode,
+ 									xlrec->target.node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	blkno = ItemPointerGetBlockNumber(&(xlrec->target.tid));
  
  	/*
*************** heap_xlog_update(XLogRecPtr lsn, XLogRec
*** 4490,4495 ****
--- 4535,4551 ----
  	uint32		newlen;
  	Size		freespace;
  
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->target.node.dbNode,
+ 									xlrec->target.node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	/*
  	 * The visibility map may need to be fixed even if the heap page is
  	 * already up-to-date.
*************** heap_xlog_lock(XLogRecPtr lsn, XLogRecor
*** 4685,4690 ****
--- 4741,4757 ----
  	ItemId		lp = NULL;
  	HeapTupleHeader htup;
  
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->target.node.dbNode,
+ 									xlrec->target.node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	if (record->xl_info & XLR_BKP_BLOCK_1)
  		return;
  
*************** heap_xlog_inplace(XLogRecPtr lsn, XLogRe
*** 4744,4749 ****
--- 4811,4827 ----
  	uint32		oldlen;
  	uint32		newlen;
  
+ 	if (OidIsValid(standby_replica_classoid))
+ 	{
+ 		bool    exists;
+ 
+ 		exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->target.node.dbNode,
+ 									xlrec->target.node.relNode);
+ 		if (exists)
+ 			return;
+ 	}
+ 
  	if (record->xl_info & XLR_BKP_BLOCK_1)
  		return;
  
diff -dcrpN pgsql/src/backend/access/transam/xact.c pgsql-partial/src/backend/access/transam/xact.c
*** pgsql/src/backend/access/transam/xact.c	2010-08-11 07:21:54.000000000 +0200
--- pgsql-partial/src/backend/access/transam/xact.c	2010-08-12 19:54:52.000000000 +0200
***************
*** 28,33 ****
--- 28,34 ----
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
  #include "catalog/namespace.h"
+ #include "catalog/pg_replica_fn.h"
  #include "catalog/storage.h"
  #include "commands/async.h"
  #include "commands/tablecmds.h"
*************** xact_redo_commit(xl_xact_commit *xlrec, 
*** 4498,4503 ****
--- 4499,4515 ----
  
  		for (fork = 0; fork <= MAX_FORKNUM; fork++)
  		{
+ 			if (OidIsValid(standby_replica_classoid))
+ 			{
+ 				bool	exists;
+ 
+ 				exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->xnodes[i].dbNode,
+ 									xlrec->xnodes[i].relNode);
+ 				if (exists)
+ 					continue;
+ 			}
+ 
  			if (smgrexists(srel, fork))
  			{
  				XLogDropRelation(xlrec->xnodes[i], fork);
*************** xact_redo_abort(xl_xact_abort *xlrec, Tr
*** 4603,4608 ****
--- 4615,4631 ----
  
  		for (fork = 0; fork <= MAX_FORKNUM; fork++)
  		{
+ 			if (OidIsValid(standby_replica_classoid))
+ 			{
+ 				bool	exists;
+ 
+ 				exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->xnodes[i].dbNode,
+ 									xlrec->xnodes[i].relNode);
+ 				if (exists)
+ 					continue;
+ 			}
+ 
  			if (smgrexists(srel, fork))
  			{
  				XLogDropRelation(xlrec->xnodes[i], fork);
diff -dcrpN pgsql/src/backend/access/transam/xlog.c pgsql-partial/src/backend/access/transam/xlog.c
*** pgsql/src/backend/access/transam/xlog.c	2010-08-11 07:19:26.000000000 +0200
--- pgsql-partial/src/backend/access/transam/xlog.c	2010-08-12 19:30:58.000000000 +0200
***************
*** 35,40 ****
--- 35,41 ----
  #include "catalog/catversion.h"
  #include "catalog/pg_control.h"
  #include "catalog/pg_database.h"
+ #include "catalog/pg_replica_fn.h"
  #include "catalog/pg_type.h"
  #include "funcapi.h"
  #include "libpq/pqsignal.h"
***************
*** 53,59 ****
--- 54,62 ----
  #include "utils/builtins.h"
  #include "utils/guc.h"
  #include "utils/ps_status.h"
+ #include "utils/relcache.h"
  #include "utils/relmapper.h"
+ #include "utils/snapmgr.h"
  #include "pg_trace.h"
  
  
*************** TimeLineID	ThisTimeLineID = 0;
*** 147,152 ****
--- 150,161 ----
   */
  bool		InRecovery = false;
  
+ /*
+  * Replication filtering class set in recovery.conf
+  */
+ char	   *standby_replica_classname = NULL;
+ Oid		standby_replica_classoid = InvalidOid;
+ 
  /* Are we in Hot Standby mode? Only valid in startup process, see xlog.h */
  HotStandbyState standbyState = STANDBY_DISABLED;
  
*************** readRecoveryCommandFile(void)
*** 5286,5291 ****
--- 5295,5307 ----
  			ereport(DEBUG2,
  					(errmsg("replication_mode = '%s'", tok2)));
  		}
+ 		else if (strcmp(tok1, "filtering_class") == 0)
+ 		{
+ 			standby_replica_classname = pstrdup(tok2);
+ 			ereport(DEBUG2,
+ 					(errmsg("filtering_class name = '%s'",
+ 							standby_replica_classname)));
+ 		}
  		else
  			ereport(FATAL,
  					(errmsg("unrecognized recovery parameter \"%s\"",
*************** StartupXLOG(void)
*** 6145,6150 ****
--- 6161,6185 ----
  			SetForwardFsyncRequests();
  			SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
  			bgwriterLaunched = true;
+ 
+ 			/*
+ 			 * If we have a filtering class name then set up the connection
+ 			 * to the replication database and read the Oid of the class from
+ 			 * pg_replica. Later, in the rm_redo() calls, the pg_replicaitem rows
+ 			 * will be checked.
+ 			 */
+ 			if (standby_replica_classname)
+ 			{
+ 				ereport(LOG, (errmsg("activating partial replication")));
+ 
+ 				/* Set up the connection to our private hardcoded database */
+ 				InitPostgresForPartialReplication();
+ 
+ 				StartTransactionCommand();
+ 				(void) GetTransactionSnapshot();
+ 				standby_replica_classoid = get_replica_class_oid(standby_replica_classname);
+ 				CommitTransactionCommand();
+ 			}
  		}
  
  		/*
diff -dcrpN pgsql/src/backend/catalog/catalog.c pgsql-partial/src/backend/catalog/catalog.c
*** pgsql/src/backend/catalog/catalog.c	2010-04-21 08:43:41.000000000 +0200
--- pgsql-partial/src/backend/catalog/catalog.c	2010-08-11 07:39:21.000000000 +0200
***************
*** 32,37 ****
--- 32,39 ----
  #include "catalog/pg_namespace.h"
  #include "catalog/pg_pltemplate.h"
  #include "catalog/pg_db_role_setting.h"
+ #include "catalog/pg_replica.h"
+ #include "catalog/pg_replicaitem.h"
  #include "catalog/pg_shdepend.h"
  #include "catalog/pg_shdescription.h"
  #include "catalog/pg_tablespace.h"
*************** IsSharedRelation(Oid relationId)
*** 309,315 ****
  		relationId == SharedDescriptionRelationId ||
  		relationId == SharedDependRelationId ||
  		relationId == TableSpaceRelationId ||
! 		relationId == DbRoleSettingRelationId)
  		return true;
  	/* These are their indexes (see indexing.h) */
  	if (relationId == AuthIdRolnameIndexId ||
--- 311,319 ----
  		relationId == SharedDescriptionRelationId ||
  		relationId == SharedDependRelationId ||
  		relationId == TableSpaceRelationId ||
! 		relationId == DbRoleSettingRelationId ||
! 		relationId == ReplicaRelationId ||
! 		relationId == ReplicaItemRelationId)
  		return true;
  	/* These are their indexes (see indexing.h) */
  	if (relationId == AuthIdRolnameIndexId ||
*************** IsSharedRelation(Oid relationId)
*** 324,330 ****
  		relationId == SharedDependReferenceIndexId ||
  		relationId == TablespaceOidIndexId ||
  		relationId == TablespaceNameIndexId ||
! 		relationId == DbRoleSettingDatidRolidIndexId)
  		return true;
  	/* These are their toast tables and toast indexes (see toasting.h) */
  	if (relationId == PgDatabaseToastTable ||
--- 328,337 ----
  		relationId == SharedDependReferenceIndexId ||
  		relationId == TablespaceOidIndexId ||
  		relationId == TablespaceNameIndexId ||
! 		relationId == DbRoleSettingDatidRolidIndexId ||
! 		relationId == ReplicaOidIndexId ||
! 		relationId == ReplicaClassnameIndexId ||
! 		relationId == ReplicaItemIndexId)
  		return true;
  	/* These are their toast tables and toast indexes (see toasting.h) */
  	if (relationId == PgDatabaseToastTable ||
diff -dcrpN pgsql/src/backend/catalog/Makefile pgsql-partial/src/backend/catalog/Makefile
*** pgsql/src/backend/catalog/Makefile	2010-05-14 11:05:06.000000000 +0200
--- pgsql-partial/src/backend/catalog/Makefile	2010-08-11 07:39:21.000000000 +0200
*************** include $(top_builddir)/src/Makefile.glo
*** 13,19 ****
  OBJS = catalog.o dependency.o heap.o index.o indexing.o namespace.o aclchk.o \
         pg_aggregate.o pg_constraint.o pg_conversion.o pg_depend.o pg_enum.o \
         pg_inherits.o pg_largeobject.o pg_namespace.o pg_operator.o pg_proc.o \
!        pg_db_role_setting.o pg_shdepend.o pg_type.o storage.o toasting.o
  
  BKIFILES = postgres.bki postgres.description postgres.shdescription
  
--- 13,19 ----
  OBJS = catalog.o dependency.o heap.o index.o indexing.o namespace.o aclchk.o \
         pg_aggregate.o pg_constraint.o pg_conversion.o pg_depend.o pg_enum.o \
         pg_inherits.o pg_largeobject.o pg_namespace.o pg_operator.o pg_proc.o \
!        pg_replica.o pg_db_role_setting.o pg_shdepend.o pg_type.o storage.o toasting.o
  
  BKIFILES = postgres.bki postgres.description postgres.shdescription
  
*************** POSTGRES_BKI_SRCS = $(addprefix $(top_sr
*** 34,39 ****
--- 34,40 ----
  	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
  	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
  	pg_authid.h pg_auth_members.h pg_shdepend.h pg_shdescription.h \
+ 	pg_replica.h pg_replicaitem.h \
  	pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
  	pg_ts_parser.h pg_ts_template.h \
  	pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
diff -dcrpN pgsql/src/backend/catalog/pg_replica.c pgsql-partial/src/backend/catalog/pg_replica.c
*** pgsql/src/backend/catalog/pg_replica.c	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/backend/catalog/pg_replica.c	2010-08-12 20:14:11.000000000 +0200
***************
*** 0 ****
--- 1,141 ----
+ /*-------------------------------------------------------------------------
+  *
+  * replica.c
+  *	  Common functions for replication classes
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ 
+ #include "access/genam.h"
+ #include "access/heapam.h"
+ #include "access/skey.h"
+ #include "access/xact.h"
+ #include "catalog/indexing.h"
+ #include "catalog/pg_replica.h"
+ #include "catalog/pg_replica_fn.h"
+ #include "catalog/pg_replicaitem.h"   
+ #include "catalog/pg_type.h"
+ #include "executor/spi.h"
+ #include "utils/fmgroids.h"
+ #include "utils/rel.h"
+ #include "utils/snapmgr.h"
+ #include "utils/syscache.h"
+ #include "utils/tqual.h"
+ 
+ /*
+  * get_replica_class_oid - given a replication class name, look up the OID
+  *
+  * Returns InvalidOid if replication class name not found.
+  */
+ Oid
+ get_replica_class_oid(const char *classname)
+ {
+ 	Relation	pg_replica;
+ 	ScanKeyData entry[1];
+ 	SysScanDesc scan;
+ 	HeapTuple	rpltuple;
+ 	Oid		oid;
+ 
+ 	pg_replica = heap_open(ReplicaRelationId, AccessShareLock);
+ 	ScanKeyInit(&entry[0],
+ 					Anum_pg_replica_classname,
+ 					BTEqualStrategyNumber, F_NAMEEQ,
+ 					CStringGetDatum(classname));
+ 	scan = systable_beginscan(pg_replica, ReplicaClassnameIndexId, true,
+ 					SnapshotNow, 1, entry);
+ 	rpltuple = systable_getnext(scan);
+ 	/* We assume that there can be at most one matching tuple */
+ 	if (HeapTupleIsValid(rpltuple))
+ 		oid = HeapTupleGetOid(rpltuple);
+ 	else
+ 		oid = InvalidOid;
+ 
+ 	systable_endscan(scan);
+ 	heap_close(pg_replica, AccessShareLock);
+ 
+ 	elog(LOG, "get_replica_class_oid was called. class name '%s' oid '%d'", classname, oid);
+ 
+ 	return oid;
+ }
+ 
+ bool
+ replica_item_exists(Oid classoid, Oid dboid, Oid reloid)
+ {
+ 	bool	result;
+ 
+ 	StartTransactionCommand();
+ 	(void) GetTransactionSnapshot();
+ 
+ 	result = SearchSysCacheExists(REPLICAITEMTRIPLET,
+ 					ObjectIdGetDatum(classoid),
+ 					ObjectIdGetDatum(dboid),
+ 					ObjectIdGetDatum(reloid),
+ 					0);
+ 
+ 	CommitTransactionCommand();
+ 
+ 	elog(LOG, "replica_item_exists was called (%d,%d,%d)=%d", classoid, dboid, reloid, result);
+ 
+ 	return result;
+ 
+ #if 0
+ 	Relation	pg_replicaitem;
+ 	bool		result;
+ 
+ 	StartTransactionCommand();
+ 	(void) GetTransactionSnapshot();
+ 
+ 	pg_replicaitem = heap_open(ReplicaItemRelationId, RowShareLock);
+ 	result = replica_item_exists2(pg_replicaitem, classoid, dboid, InvalidOid);
+ 	heap_close(pg_replicaitem, RowShareLock);
+ 
+ 	CommitTransactionCommand();
+ 
+ 	elog(LOG, "replica_item_exists was called (%d,%d,%d)=%d", classoid, dboid, reloid, result);
+ 
+ 	return result;
+ #endif
+ }
+ 
+ bool
+ replica_item_exists2(Relation pg_replicaitem,
+ 				Oid classoid, Oid dboid, Oid reloid)
+ {
+ 	ScanKeyData entry[3];
+ 	SysScanDesc scan;
+ 	HeapTuple	itemtuple;
+ 	bool		result;
+ 
+ 	if (!OidIsValid(classoid))
+ 		return false;
+ 	if (!OidIsValid(dboid))
+ 		return false;
+ 
+ 	ScanKeyInit(&entry[0],
+ 					Anum_pg_replicaitem_classoid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(classoid));
+ 	ScanKeyInit(&entry[1],
+ 					Anum_pg_replicaitem_dboid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(dboid));
+ 	ScanKeyInit(&entry[2],
+ 					Anum_pg_replicaitem_reloid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(reloid));
+ 
+ 	scan = systable_beginscan(pg_replicaitem, ReplicaItemIndexId, true,
+ 					SnapshotNow, 3, entry);
+ 
+ 	itemtuple = systable_getnext(scan);
+ 
+ 	result = HeapTupleIsValid(itemtuple);
+ 
+ 	systable_endscan(scan);
+ 
+ 	return result;
+ }
diff -dcrpN pgsql/src/backend/catalog/storage.c pgsql-partial/src/backend/catalog/storage.c
*** pgsql/src/backend/catalog/storage.c	2010-02-09 22:43:30.000000000 +0100
--- pgsql-partial/src/backend/catalog/storage.c	2010-08-12 19:55:18.000000000 +0200
***************
*** 23,28 ****
--- 23,29 ----
  #include "access/xact.h"
  #include "access/xlogutils.h"
  #include "catalog/catalog.h"
+ #include "catalog/pg_replica_fn.h"
  #include "catalog/storage.h"
  #include "storage/freespace.h"
  #include "storage/smgr.h"
*************** smgr_redo(XLogRecPtr lsn, XLogRecord *re
*** 456,461 ****
--- 457,473 ----
  		xl_smgr_create *xlrec = (xl_smgr_create *) XLogRecGetData(record);
  		SMgrRelation reln;
  
+ 		if (OidIsValid(standby_replica_classoid))
+ 		{
+ 			bool exists;
+ 
+ 			exists = replica_item_exists(standby_replica_classoid,
+ 								xlrec->rnode.dbNode,
+ 								xlrec->rnode.relNode);
+ 			if (exists)
+ 				return;
+ 		}
+ 
  		reln = smgropen(xlrec->rnode);
  		smgrcreate(reln, MAIN_FORKNUM, true);
  	}
*************** smgr_redo(XLogRecPtr lsn, XLogRecord *re
*** 465,470 ****
--- 477,493 ----
  		SMgrRelation reln;
  		Relation	rel;
  
+ 		if (OidIsValid(standby_replica_classoid))
+ 		{
+ 			bool exists;
+ 
+ 			exists = replica_item_exists(standby_replica_classoid,
+ 								xlrec->rnode.dbNode,
+ 								xlrec->rnode.relNode);
+ 			if (exists)
+ 				return;
+ 		}
+ 
  		reln = smgropen(xlrec->rnode);
  
  		/*
diff -dcrpN pgsql/src/backend/commands/dbcommands.c pgsql-partial/src/backend/commands/dbcommands.c
*** pgsql/src/backend/commands/dbcommands.c	2010-08-07 10:40:10.000000000 +0200
--- pgsql-partial/src/backend/commands/dbcommands.c	2010-08-12 17:08:10.000000000 +0200
***************
*** 35,40 ****
--- 35,41 ----
  #include "catalog/pg_authid.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_db_role_setting.h"
+ #include "catalog/pg_replica_fn.h"
  #include "catalog/pg_tablespace.h"
  #include "commands/comment.h"
  #include "commands/dbcommands.h"
*************** dbase_redo(XLogRecPtr lsn, XLogRecord *r
*** 1925,1930 ****
--- 1926,1944 ----
  		xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
  		char	   *dst_path;
  
+ 		if (OidIsValid(standby_replica_classoid))
+ 		{
+ 			bool	exists;
+ 
+ 			StartTransactionCommand();
+ 			exists = replica_item_exists(standby_replica_classoid,
+ 									xlrec->db_id,
+ 									InvalidOid);
+ 			CommitTransactionCommand();
+ 			if (exists)
+ 				return;
+ 		}
+ 
  		dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
  
  		if (InHotStandby)
diff -dcrpN pgsql/src/backend/commands/Makefile pgsql-partial/src/backend/commands/Makefile
*** pgsql/src/backend/commands/Makefile	2009-07-29 22:56:18.000000000 +0200
--- pgsql-partial/src/backend/commands/Makefile	2010-08-11 07:39:21.000000000 +0200
*************** OBJS = aggregatecmds.o alter.o analyze.o
*** 16,22 ****
  	constraint.o conversioncmds.o copy.o \
  	dbcommands.o define.o discard.o explain.o foreigncmds.o functioncmds.o \
  	indexcmds.o lockcmds.o operatorcmds.o opclasscmds.o \
! 	portalcmds.o prepare.o proclang.o \
  	schemacmds.o sequence.o tablecmds.o tablespace.o trigger.o \
  	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
  	variable.o view.o
--- 16,22 ----
  	constraint.o conversioncmds.o copy.o \
  	dbcommands.o define.o discard.o explain.o foreigncmds.o functioncmds.o \
  	indexcmds.o lockcmds.o operatorcmds.o opclasscmds.o \
! 	portalcmds.o prepare.o proclang.o replica.o \
  	schemacmds.o sequence.o tablecmds.o tablespace.o trigger.o \
  	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
  	variable.o view.o
diff -dcrpN pgsql/src/backend/commands/replica.c pgsql-partial/src/backend/commands/replica.c
*** pgsql/src/backend/commands/replica.c	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/backend/commands/replica.c	2010-08-11 07:44:52.000000000 +0200
***************
*** 0 ****
--- 1,549 ----
+ /*-------------------------------------------------------------------------
+  *
+  * replica.c
+  *	  Commands to manipulate replication classes
+  *
+  * Replication classes are filters for replication slaves indicating
+  * what NOT to replicate. This is for saving space on slaves where
+  * certain data is not needed. Possible exluded objects are
+  * whole databases and relations (tables, indexes, sequences) inside
+  * databases.
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ 
+ #include "access/genam.h"
+ #include "access/heapam.h"
+ #include "access/skey.h"
+ #include "catalog/catalog.h"
+ #include "catalog/dependency.h"
+ #include "catalog/indexing.h"
+ #include "catalog/namespace.h"
+ #include "catalog/pg_database.h"
+ #include "catalog/pg_replica.h"
+ #include "catalog/pg_replica_fn.h"
+ #include "catalog/pg_replicaitem.h"
+ #include "commands/dbcommands.h"
+ #include "commands/replica.h"
+ #include "miscadmin.h"
+ #include "nodes/parsenodes.h"
+ #include "postmaster/bgwriter.h"
+ #include "utils/builtins.h"
+ #include "utils/fmgroids.h"
+ #include "utils/lsyscache.h"
+ #include "utils/rel.h"
+ #include "utils/syscache.h"
+ #include "utils/tqual.h"
+ 
+ typedef struct replica_item {
+ 	Oid		dboid;
+ 	Oid		reloid;
+ 	ReplicaElem	   *elem;
+ 	bool		warned;
+ 	struct replica_item *next;
+ } replica_item;
+ 
+ static Oid
+ get_lastsysoid(Oid dboid)
+ {
+ 	HeapTuple	tuple;
+ 	Oid		result = InvalidOid;;
+ 
+ 	tuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dboid));
+ 
+ 	if (HeapTupleIsValid(tuple))
+ 	{
+ 		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);
+ 
+ 		result = dbform->datlastsysoid;
+ 	}
+ 
+ 	ReleaseSysCache(tuple);
+ 
+ 	return result;
+ }
+ 
+ static HeapTuple
+ get_replica_item(Relation pg_replicaitem, Oid classoid, Oid dboid, Oid reloid)
+ {
+ 	ScanKeyData entry[3];
+ 	SysScanDesc scan;
+ 	HeapTuple	itemtuple;
+ 
+ 	ScanKeyInit(&entry[0],
+ 					Anum_pg_replicaitem_classoid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(classoid));
+ 	ScanKeyInit(&entry[1],
+ 					Anum_pg_replicaitem_dboid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(dboid));
+ 	ScanKeyInit(&entry[2],
+ 					Anum_pg_replicaitem_reloid,
+ 					BTEqualStrategyNumber, F_OIDEQ,
+ 					ObjectIdGetDatum(reloid));
+ 
+ 	scan = systable_beginscan(pg_replicaitem, ReplicaItemIndexId, true,
+ 					SnapshotNow, 3, entry);
+ 
+ 	itemtuple = systable_getnext(scan);
+ 
+ 	systable_endscan(scan);
+ 
+ 	return itemtuple;
+ }
+ 
+ static void
+ add_excluded_objects(Oid classoid, replica_item *objs)
+ {
+ 	Relation	pg_replicaitem_rel;
+ 	Datum		new_item[Natts_pg_replicaitem];
+ 	bool		new_item_nulls[Natts_pg_replicaitem];
+ 	HeapTuple	tuple;
+ 
+ 	pg_replicaitem_rel = heap_open(ReplicaItemRelationId, RowExclusiveLock);
+ 
+ 	while (objs)
+ 	{
+ 		if (replica_item_exists2(pg_replicaitem_rel, classoid, objs->dboid, objs->reloid))
+ 		{
+ 			if (objs->elem->kind == 'd')
+ 			{
+ 				ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_DATABASE),
+ 						 errmsg("database \"%s\" already excluded in the replication class",
+ 						 objs->elem->dbname)));
+ 			}
+ 			else
+ 			{
+ 				if (objs->elem->range->catalogname)
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s.%s.%s\" already excluded in the replication class",
+ 							 objs->elem->range->catalogname, objs->elem->range->schemaname, objs->elem->range->relname)));
+ 				else if (objs->elem->range->schemaname)
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s.%s\" already excluded in the replication class",
+ 							 objs->elem->range->schemaname, objs->elem->range->relname)));
+ 				else
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s\" already excluded in the replication class",
+ 							 objs->elem->range->relname)));
+ 			}
+ 
+ 			objs = objs->next;
+ 			continue;
+ 		}
+ 
+ 		/* Form tuple */
+ 		MemSet(new_item, 0, sizeof(new_item));
+ 		MemSet(new_item_nulls, false, sizeof(new_item_nulls));
+ 
+ 		new_item[Anum_pg_replicaitem_classoid - 1] = ObjectIdGetDatum(classoid);
+ 		new_item[Anum_pg_replicaitem_dboid - 1] = ObjectIdGetDatum(objs->dboid);
+ 		new_item[Anum_pg_replicaitem_reloid - 1] = ObjectIdGetDatum(objs->reloid);
+ 
+ 		tuple = heap_form_tuple(RelationGetDescr(pg_replicaitem_rel),
+ 								new_item, new_item_nulls);
+ 
+ 		simple_heap_insert(pg_replicaitem_rel, tuple);
+ 
+ 		/* Update indexes */
+ 		CatalogUpdateIndexes(pg_replicaitem_rel, tuple);
+ 
+ 		objs = objs->next;
+ 	}
+ 
+ 	heap_close(pg_replicaitem_rel, NoLock);
+ }
+ 
+ static void
+ del_excluded_objects(Oid classoid, replica_item *objs)
+ {
+ 	Relation	pg_replicaitem_rel;
+ 	HeapTuple	tuple;
+ 
+ 	pg_replicaitem_rel = heap_open(ReplicaItemRelationId, RowExclusiveLock);
+ 
+ 	while (objs)
+ 	{
+ 		tuple = get_replica_item(pg_replicaitem_rel, classoid, objs->dboid, objs->reloid);
+ 		if (!HeapTupleIsValid(tuple))
+ 		{
+ 			if (objs->elem->kind == 'd')
+ 			{
+ 				ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_DATABASE),
+ 						 errmsg("database \"%s\" not excluded in the replication class",
+ 						 objs->elem->dbname)));
+ 			}
+ 			else
+ 			{
+ 				if (objs->elem->range->catalogname)
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s.%s.%s\" not excluded in the replication class",
+ 							 objs->elem->range->catalogname, objs->elem->range->schemaname, objs->elem->range->relname)));
+ 				else if (objs->elem->range->schemaname)
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s.%s\" not excluded in the replication class",
+ 							 objs->elem->range->schemaname, objs->elem->range->relname)));
+ 				else
+ 					ereport(WARNING,
+ 							(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 							 errmsg("relation \"%s\" not excluded in the replication class",
+ 							 objs->elem->range->relname)));
+ 			}
+ 
+ 			objs = objs->next;
+ 			continue;
+ 		}
+ 
+ 		simple_heap_delete(pg_replicaitem_rel, &tuple->t_self);
+ 
+ 		objs = objs->next;
+ 	}
+ 
+ 	heap_close(pg_replicaitem_rel, NoLock);
+ }
+ 
+ static void
+ insert_item(replica_item **objs, replica_item *item)
+ {
+ 	replica_item	*cur, *prev = NULL;
+ 
+ 	cur = *objs;
+ 
+ 	while (cur)
+ 	{
+ 		if (item->dboid < cur->dboid)
+ 			break;
+ 		if (item->dboid == cur->dboid && item->reloid <= cur->reloid)
+ 			break;
+ 
+ 		prev = cur;
+ 		cur = cur->next;
+ 	}
+ 
+ 
+ 	if (cur && cur->dboid == item->dboid && cur->reloid == item->reloid)
+ 	{
+ 		if (!cur->warned)
+ 		{
+ 			cur->warned = true;
+ 			if (cur->elem->kind == 'd')
+ 				ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_DATABASE),
+ 						 errmsg("database \"%s\" specified more than once", cur->elem->dbname)));
+ 			else
+ 			{
+ 				if (cur->elem->range->catalogname)
+ 					ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 						 errmsg("relation \"%s.%s.%s\" specified more than once",
+ 						 cur->elem->range->catalogname, cur->elem->range->schemaname, cur->elem->range->relname)));
+ 				else if (cur->elem->range->schemaname)
+ 					ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 						 errmsg("relation \"%s.%s\" specified more than once",
+ 						 cur->elem->range->schemaname, cur->elem->range->relname)));
+ 				else
+ 					ereport(WARNING,
+ 						(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 						 errmsg("relation \"%s\" specified more than once",
+ 						 cur->elem->range->relname)));
+ 			}
+ 		}
+ 		return;
+ 	}
+ 
+ 	if (prev)
+ 		prev->next = item;
+ 
+ 	item->next = cur;
+ 
+ 	if (cur == *objs)
+ 		*objs = item;
+ }
+ 
+ static void
+ sort_objects(List *objects, replica_item **add_objs, replica_item **del_objs)
+ {
+ 	ListCell	   *cell;
+ 	char		   *current_db;
+ 	Oid		   *search_path_oids;
+ 	int		n_search_path;
+ 
+ 	if (add_objs)
+ 		*add_objs = NULL;
+ 	if (del_objs)
+ 		*del_objs = NULL;
+ 
+ 	current_db = get_database_name(MyDatabaseId);
+ 
+ 	/*
+ 	 * Try to fetch the search path elements into an array.
+ 	 */
+ 	search_path_oids = (Oid *)palloc(256 * sizeof(Oid));
+ 	n_search_path = fetch_search_path_array(search_path_oids, 256);
+ 	if (n_search_path > 256)
+ 	{
+ 		search_path_oids = repalloc(search_path_oids, n_search_path * sizeof(Oid));
+ 		n_search_path = fetch_search_path_array(search_path_oids, n_search_path);
+ 	}
+ 
+ 	foreach(cell, objects)
+ 	{
+ 		
+ 		ReplicaElem	*elem = (ReplicaElem *)lfirst(cell);
+ 		replica_item	*item = (replica_item *)palloc0(sizeof(replica_item));
+ 
+ 		switch (elem->kind)
+ 		{
+ 			case 'd':
+ 			{
+ 				Oid	dbid = get_database_oid(elem->dbname, true);
+ 				Oid	lastsysoid = get_lastsysoid(dbid);
+ 
+ 				if (!OidIsValid(dbid))
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_DATABASE_DEFINITION),
+ 							 errmsg("database \"%s\" doesn't exist", elem->dbname)));
+ 
+ 				if (dbid < lastsysoid)
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_DATABASE_DEFINITION),
+ 							 errmsg("system database \"%s\" cannot be excluded from replication",
+ 							 elem->dbname)));
+ 
+ 				item->dboid = dbid;
+ 				item->reloid = InvalidOid;
+ 				item->elem = elem;
+ 				break;
+ 			}
+ 			case 'r':
+ 			{
+ 				Oid	nspid;
+ 				Oid	relid = InvalidOid;
+ 				Oid	lastsysoid = get_lastsysoid(MyDatabaseId);
+ 
+ 				if (MyDatabaseId < lastsysoid)
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ 							 errmsg("relation in system database cannot be excluded from replication")));
+ 
+ 				if (elem->range->catalogname && strcmp(elem->range->catalogname, current_db))
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ 							 errmsg("cross-database relation checking for \"%s.%s.%s\" is not possible",
+ 								elem->range->catalogname, elem->range->schemaname, elem->range->relname)));
+ 
+ 				if (elem->range->schemaname)
+ 				{
+ 					nspid = GetSysCacheOid1(NAMESPACENAME, PointerGetDatum(elem->range->schemaname));
+ 					if (!OidIsValid(nspid))
+ 					{
+ 						if (elem->range->catalogname)
+ 							ereport(ERROR,
+ 									(errcode(ERRCODE_INVALID_SCHEMA_DEFINITION),
+ 									 errmsg("invalid schema \"%s\" for relation \"%s.%s.%s\"",
+ 									 elem->range->schemaname,
+ 									 elem->range->catalogname, elem->range->schemaname, elem->range->relname)));
+ 						else
+ 							ereport(ERROR,
+ 									(errcode(ERRCODE_INVALID_SCHEMA_DEFINITION),
+ 									 errmsg("invalid schema \"%s\" for relation \"%s.%s\"",
+ 									 elem->range->schemaname,
+ 									 elem->range->schemaname, elem->range->relname)));
+ 					}
+ 
+ 					relid = get_relname_relid(elem->range->relname, nspid);
+ 					if (!OidIsValid(relid))
+ 					{
+ 						if (elem->range->catalogname)
+ 							ereport(ERROR,
+ 									(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ 									 errmsg("invalid relation \"%s.%s.%s\"",
+ 									 elem->range->catalogname, elem->range->schemaname, elem->range->relname)));
+ 						else
+ 							ereport(ERROR,
+ 									(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ 									 errmsg("invalid relation \"%s.%s\"",
+ 									 elem->range->schemaname, elem->range->relname)));
+ 					}
+ 				}
+ 				else
+ 				{
+ 					int	i;
+ 					bool	found = false;
+ 
+ 					for (i = 0; i < n_search_path; i++)
+ 					{
+ 						relid = get_relname_relid(elem->range->relname, search_path_oids[i]);
+ 						if (OidIsValid(relid))
+ 						{
+ 							found = true;
+ 							break;
+ 						}
+ 					}
+ 					if (!found)
+ 						ereport(ERROR,
+ 								(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ 								 errmsg("invalid relation \"%s",
+ 								 elem->range->relname)));
+ 				}
+ 
+ 				if (relid < lastsysoid)
+ 				{
+ 					if (elem->range->catalogname)
+ 						ereport(ERROR,
+ 								(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 								 errmsg("system relation \"%s.%s.%s\" cannot be excluded from replication",
+ 								 elem->range->catalogname, elem->range->schemaname, elem->range->relname)));
+ 					else if (elem->range->schemaname)
+ 						ereport(ERROR,
+ 								(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 								 errmsg("system relation \"%s.%s\" cannot be excluded from replication",
+ 								 elem->range->schemaname, elem->range->relname)));
+ 					else
+ 						ereport(ERROR,
+ 								(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 								 errmsg("system relation \"%s\" cannot be excluded from replication",
+ 								 elem->range->relname)));
+ 				}
+ 
+ 				item->dboid = MyDatabaseId;
+ 				item->reloid = relid;
+ 				item->elem = elem;
+ 				break;
+ 			}
+ 			default:
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INTERNAL_ERROR),
+ 						 errmsg("invalid object kind \"%c\"", elem->kind)));
+ 				break;
+ 		}
+ 
+ 		if (elem->excluded)
+ 		{
+ 			if (!add_objs)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INTERNAL_ERROR),
+ 						 errmsg("objects were specified for exclusion but no pointer to store it")));
+ 
+ 			insert_item(add_objs, item);
+ 		}
+ 		else
+ 		{
+ 			if (!del_objs)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INTERNAL_ERROR),
+ 						 errmsg("objects were specified for re-inclusion but no pointer to store it")));
+ 
+ 			insert_item(del_objs, item);
+ 		}
+ 	}
+ }
+ 
+ void
+ CreateReplicaClass(CreateReplicaStmt *stmt)
+ {
+ 	Relation	pg_replica_rel;
+ 	Oid		classoid;
+ 	Oid		datdba;
+ 	HeapTuple	tuple;
+ 	Datum		new_class[Natts_pg_replica];
+ 	bool		new_class_nulls[Natts_pg_replica];
+ 	replica_item	   *add_objs;
+ 
+ 	if (!superuser())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 				 errmsg("permission denied to create a replication class")));
+ 
+ 	datdba = GetUserId();
+ 
+ 	sort_objects(stmt->excluded_objects, &add_objs, NULL);
+ 
+ 	/*
+ 	 * Check for replication class name conflict.
+ 	 * This is just to give a more friendly error message than
+ 	 * "unique index violation".  There's a race condition but
+ 	 * we're willing to accept the less friendly message in that case.
+ 	 */
+ 	if (OidIsValid(get_replica_class_oid(stmt->name)))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_DUPLICATE_OBJECT),
+ 				 errmsg("replication class \"%s\" already exists", stmt->name)));
+ 
+ 	/*
+ 	 * Select an OID for the new class.
+ 	 */
+ 	pg_replica_rel = heap_open(ReplicaRelationId, RowExclusiveLock);
+ 
+ 	classoid = GetNewOid(pg_replica_rel);
+ 
+ 	/* Form tuple */
+ 	MemSet(new_class, 0, sizeof(new_class));
+ 	MemSet(new_class_nulls, false, sizeof(new_class_nulls));
+ 
+ 	new_class[Anum_pg_replica_classname - 1] =
+ 							DirectFunctionCall1(namein, CStringGetDatum(stmt->name));
+ 
+ 	tuple = heap_form_tuple(RelationGetDescr(pg_replica_rel),
+ 								new_class, new_class_nulls);
+ 
+ 	HeapTupleSetOid(tuple, classoid);
+ 
+ 	simple_heap_insert(pg_replica_rel, tuple);
+ 
+ 	/* Update indexes */
+ 	CatalogUpdateIndexes(pg_replica_rel, tuple);
+ 
+ 	/* Register owner dependency */
+ 	recordDependencyOnOwner(ReplicaRelationId, classoid, datdba);
+ 
+ 	add_excluded_objects(classoid, add_objs);
+ 
+ 	/*
+ 	 * Force a checkpoint.
+ 	 */
+ 	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
+ 
+ 	heap_close(pg_replica_rel, NoLock);
+ }
+ 
+ void
+ AlterReplicaClass(AlterReplicaStmt *stmt)
+ {
+ 	Oid	classoid;
+ 	Oid	datdba;
+ 	replica_item	   *add_objs;
+ 	replica_item	   *del_objs;
+ 
+ 	if (!superuser())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 				 errmsg("permission denied to create a replication class")));
+ 
+ 	datdba = GetUserId();
+ 
+ 	classoid = get_replica_class_oid(stmt->name);
+ 	if (!OidIsValid(classoid))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_UNDEFINED_OBJECT),
+ 				 errmsg("replication class \"%s\" doesn't exist", stmt->name)));
+ 
+ 	sort_objects(stmt->modified_objects, &add_objs, &del_objs);
+ 
+ 	add_excluded_objects(classoid, add_objs);
+ 
+ 	del_excluded_objects(classoid, del_objs);
+ }
diff -dcrpN pgsql/src/backend/parser/gram.y pgsql-partial/src/backend/parser/gram.y
*** pgsql/src/backend/parser/gram.y	2010-08-07 10:40:11.000000000 +0200
--- pgsql-partial/src/backend/parser/gram.y	2010-08-11 07:40:35.000000000 +0200
*************** static TypeName *TableFuncTypeName(List 
*** 185,196 ****
  		AlterForeignServerStmt AlterGroupStmt
  		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterTableStmt
  		AlterUserStmt AlterUserMappingStmt AlterUserSetStmt
! 		AlterRoleStmt AlterRoleSetStmt
  		AlterDefaultPrivilegesStmt DefACLAction
  		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
  		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
  		CreateDomainStmt CreateGroupStmt CreateOpClassStmt
! 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
  		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
  		CreateFdwStmt CreateForeignServerStmt CreateAssertStmt CreateTrigStmt
  		CreateUserStmt CreateUserMappingStmt CreateRoleStmt
--- 185,196 ----
  		AlterForeignServerStmt AlterGroupStmt
  		AlterObjectSchemaStmt AlterOwnerStmt AlterSeqStmt AlterTableStmt
  		AlterUserStmt AlterUserMappingStmt AlterUserSetStmt
! 		AlterRoleStmt AlterRoleSetStmt AlterReplicaStmt
  		AlterDefaultPrivilegesStmt DefACLAction
  		AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
  		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
  		CreateDomainStmt CreateGroupStmt CreateOpClassStmt
! 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt CreateReplicaStmt
  		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
  		CreateFdwStmt CreateForeignServerStmt CreateAssertStmt CreateTrigStmt
  		CreateUserStmt CreateUserMappingStmt CreateRoleStmt
*************** static TypeName *TableFuncTypeName(List 
*** 438,443 ****
--- 438,446 ----
  				opt_frame_clause frame_extent frame_bound
  %type <str>		opt_existing_window_name
  
+ %type <list>	ReplicaExclElem ReplicaInclExclElem
+ %type <list>	ExclDbList ExclRelList InclDbList InclRelList
+ %type <list>	ReplicaExclList ReplicaInclExclList
  
  /*
   * Non-keyword token types.  These are hard-wired into the "flex" lexer.
*************** static TypeName *TableFuncTypeName(List 
*** 519,525 ****
  	QUOTE
  
  	RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX
! 	RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART
  	RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE
  
  	SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE SEQUENCES
--- 522,528 ----
  	QUOTE
  
  	RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX
! 	RELATIVE_P RELATION RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART
  	RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE
  
  	SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE SEQUENCES
*************** stmt :
*** 657,662 ****
--- 660,666 ----
  			| AlterOwnerStmt
  			| AlterSeqStmt
  			| AlterTableStmt
+ 			| AlterReplicaStmt
  			| AlterRoleSetStmt
  			| AlterRoleStmt
  			| AlterTSConfigurationStmt
*************** stmt :
*** 684,689 ****
--- 688,694 ----
  			| CreateOpFamilyStmt
  			| AlterOpFamilyStmt
  			| CreatePLangStmt
+ 			| CreateReplicaStmt
  			| CreateSchemaStmt
  			| CreateSeqStmt
  			| CreateStmt
*************** copy_generic_opt_arg_list_item:
*** 2197,2202 ****
--- 2202,2335 ----
  /*****************************************************************************
   *
   *		QUERY :
+  *				CREATE REPLICA CLASS classname
+  *
+  *****************************************************************************/
+ 
+ CreateReplicaStmt:
+ 		CREATE REPLICA CLASS name ReplicaExclList
+ 				{
+ 					CreateReplicaStmt *n = makeNode(CreateReplicaStmt);
+ 					n->name = $4;
+ 					n->excluded_objects = $5;
+ 					$$ = (Node *)n;
+ 				}
+ 		;
+ 
+ ReplicaExclList:
+ 		ReplicaExclElem					{ $$ = $1; }
+ 		| ReplicaExclList ',' ReplicaExclElem		{ $$ = list_concat($1, $3); }
+ 		;
+ 
+ ReplicaExclElem:
+ 		EXCLUDING DATABASE '(' ExclDbList ')'		{ $$ = $4; }
+ 		| EXCLUDING RELATION '(' ExclRelList ')'	{ $$ = $4; }
+ 		;
+ 
+ ExclDbList:	database_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'd';
+ 					n->excluded = true;
+ 					n->dbname = $1;
+ 					$$ = list_make1(n);
+ 				}
+ 		| ExclDbList ',' database_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'd';
+ 					n->excluded = true;
+ 					n->dbname = $3;
+ 					$$ = lappend($1, n);
+ 				}
+ 		;
+ 
+ ExclRelList:	qualified_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'r';
+ 					n->excluded = true;
+ 					n->range = $1;
+ 					$$ = list_make1(n);
+ 				}
+ 		| ExclRelList ',' qualified_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'r';
+ 					n->excluded = true;
+ 					n->range = $3;
+ 					$$ = lappend($1, n);
+ 				}
+ 		;
+ 
+ /*****************************************************************************
+  *
+  *		QUERY :
+  *				ALTER REPLICA CLASS classname
+  *
+  *****************************************************************************/
+ 
+ AlterReplicaStmt:
+ 		ALTER REPLICA CLASS name ReplicaInclExclList
+ 				{
+ 					AlterReplicaStmt *n = makeNode(AlterReplicaStmt);
+ 					n->name = $4;
+ 					n->modified_objects = $5;
+ 					$$ = (Node *)n;
+ 				}
+ 		;
+ 
+ ReplicaInclExclList:
+ 		ReplicaInclExclElem				{ $$ = $1; }
+ 		| ReplicaInclExclList ',' ReplicaInclExclElem	{ $$ = list_concat($1, $3); }
+ 		;
+ 
+ ReplicaInclExclElem:
+ 		EXCLUDING DATABASE '(' ExclDbList ')'		{ $$ = $4; }
+ 		| EXCLUDING RELATION '(' ExclRelList ')'	{ $$ = $4; }
+ 		| INCLUDING DATABASE '(' InclDbList ')'		{ $$ = $4; }
+ 		| INCLUDING RELATION '(' InclRelList ')'	{ $$ = $4; }
+ 		;
+ 
+ InclDbList:	database_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'd';
+ 					n->excluded = false;
+ 					n->dbname = $1;
+ 					$$ = list_make1(n);
+ 				}
+ 		| InclDbList ',' database_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'd';
+ 					n->excluded = false;
+ 					n->dbname = $3;
+ 					$$ = lappend($1, n);
+ 				}
+ 		;
+ 
+ InclRelList:	qualified_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'r';
+ 					n->excluded = false;
+ 					n->range = $1;
+ 					$$ = list_make1(n);
+ 				}
+ 		| InclRelList ',' qualified_name
+ 				{
+ 					ReplicaElem *n = makeNode(ReplicaElem);
+ 					n->kind = 'r';
+ 					n->excluded = false;
+ 					n->range = $3;
+ 					$$ = lappend($1, n);
+ 				}
+ 		;
+ 
+ /*****************************************************************************
+  *
+  *		QUERY :
   *				CREATE TABLE relname
   *
   *****************************************************************************/
*************** unreserved_keyword:
*** 11056,11061 ****
--- 11189,11195 ----
  			| REF
  			| REINDEX
  			| RELATIVE_P
+ 			| RELATION
  			| RELEASE
  			| RENAME
  			| REPEATABLE
diff -dcrpN pgsql/src/backend/tcop/utility.c pgsql-partial/src/backend/tcop/utility.c
*** pgsql/src/backend/tcop/utility.c	2010-07-26 10:05:50.000000000 +0200
--- pgsql-partial/src/backend/tcop/utility.c	2010-08-11 07:39:21.000000000 +0200
***************
*** 36,41 ****
--- 36,42 ----
  #include "commands/portalcmds.h"
  #include "commands/prepare.h"
  #include "commands/proclang.h"
+ #include "commands/replica.h"
  #include "commands/schemacmds.h"
  #include "commands/sequence.h"
  #include "commands/tablecmds.h"
*************** check_xact_readonly(Node *parsetree)
*** 218,223 ****
--- 219,226 ----
  		case T_AlterUserMappingStmt:
  		case T_DropUserMappingStmt:
  		case T_AlterTableSpaceOptionsStmt:
+ 		case T_CreateReplicaStmt:
+ 		case T_AlterReplicaStmt:
  			PreventCommandIfReadOnly(CreateCommandTag(parsetree));
  			break;
  		default:
*************** standard_ProcessUtility(Node *parsetree,
*** 568,573 ****
--- 571,583 ----
  			AlterTableSpaceOptions((AlterTableSpaceOptionsStmt *) parsetree);
  			break;
  
+ 		case T_CreateReplicaStmt:
+ 			CreateReplicaClass((CreateReplicaStmt *) parsetree);
+ 			break;
+ 		case T_AlterReplicaStmt:
+ 			AlterReplicaClass((AlterReplicaStmt *) parsetree);
+ 			break;
+ 
  		case T_CreateFdwStmt:
  			CreateForeignDataWrapper((CreateFdwStmt *) parsetree);
  			break;
*************** CreateCommandTag(Node *parsetree)
*** 1503,1508 ****
--- 1513,1525 ----
  			tag = "ALTER TABLESPACE";
  			break;
  
+ 		case T_CreateReplicaStmt:
+ 			tag = "CREATE REPLICA CLASS";
+ 			break;
+ 		case T_AlterReplicaStmt:
+ 			tag = "ALTER REPLICA CLASS";
+ 			break;
+ 
  		case T_CreateFdwStmt:
  			tag = "CREATE FOREIGN DATA WRAPPER";
  			break;
diff -dcrpN pgsql/src/backend/utils/cache/relmapper.c pgsql-partial/src/backend/utils/cache/relmapper.c
*** pgsql/src/backend/utils/cache/relmapper.c	2010-02-26 03:01:12.000000000 +0100
--- pgsql-partial/src/backend/utils/cache/relmapper.c	2010-08-12 21:55:45.000000000 +0200
*************** RelationMapOidToFilenode(Oid relationId,
*** 182,187 ****
--- 182,236 ----
  }
  
  /*
+  * RelationMapFilenodeToOid
+  *
+  * Given a relation filenode, look up its OID. Reverse of RelationMapOidToFilenode
+  *
+  * Returns InvalidOid if the OID is not known (which should never happen,
+  * but the caller is in a better position to report a meaningful error).
+  */
+ Oid
+ RelationMapFilenodeToOid(Oid filenode, bool shared)
+ {
+ 	const RelMapFile *map;
+ 	int32		i;
+ 
+ 	/* If there are active updates, believe those over the main maps */
+ 	if (shared)
+ 	{
+ 		map = &active_shared_updates;
+ 		for (i = 0; i < map->num_mappings; i++)
+ 		{
+ 			if (filenode == map->mappings[i].mapfilenode)
+ 				return map->mappings[i].mapoid
+ 		}
+ 		map = &shared_map;
+ 		for (i = 0; i < map->num_mappings; i++)
+ 		{
+ 			if (filenode == map->mappings[i].mapfilenode)
+ 				return map->mappings[i].mapoid
+ 		}
+ 	}
+ 	else
+ 	{
+ 		map = &active_local_updates;
+ 		for (i = 0; i < map->num_mappings; i++)
+ 		{
+ 			if (filenode == map->mappings[i].mapfilenode)
+ 				return map->mappings[i].mapoid
+ 		}
+ 		map = &local_map;
+ 		for (i = 0; i < map->num_mappings; i++)
+ 		{
+ 			if (filenode == map->mappings[i].mapfilenode)
+ 				return map->mappings[i].mapoid
+ 		}
+ 	}
+ 
+ 	return InvalidOid;
+ }
+ 
+ /*
   * RelationMapUpdateMap
   *
   * Install a new relfilenode mapping for the specified relation.
diff -dcrpN pgsql/src/backend/utils/cache/syscache.c pgsql-partial/src/backend/utils/cache/syscache.c
*** pgsql/src/backend/utils/cache/syscache.c	2010-08-07 10:40:12.000000000 +0200
--- pgsql-partial/src/backend/utils/cache/syscache.c	2010-08-11 07:39:21.000000000 +0200
***************
*** 41,46 ****
--- 41,48 ----
  #include "catalog/pg_operator.h"
  #include "catalog/pg_opfamily.h"
  #include "catalog/pg_proc.h"
+ #include "catalog/pg_replica.h"
+ #include "catalog/pg_replicaitem.h"
  #include "catalog/pg_rewrite.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_tablespace.h"
*************** static const struct cachedesc cacheinfo[
*** 541,546 ****
--- 543,570 ----
  		},
  		1024
  	},
+ 	{ReplicaItemRelationId,		/* REPLICAITEMTRIPLET */
+ 		ReplicaItemIndexId,
+ 		3,
+ 		{
+ 			Anum_pg_replicaitem_classoid,
+ 			Anum_pg_replicaitem_dboid,
+ 			Anum_pg_replicaitem_reloid,
+ 			0
+ 		},
+ 		1024
+ 	},
+ 	{ReplicaRelationId,		/* REPLICANAME */
+ 		ReplicaClassnameIndexId,
+ 		1,
+ 		{
+ 			Anum_pg_replica_classname,
+ 			0,
+ 			0,
+ 			0
+ 		},
+ 		32
+ 	},
  	{RewriteRelationId,			/* RULERELNAME */
  		RewriteRelRulenameIndexId,
  		2,
diff -dcrpN pgsql/src/backend/utils/init/postinit.c pgsql-partial/src/backend/utils/init/postinit.c
*** pgsql/src/backend/utils/init/postinit.c	2010-07-11 11:14:55.000000000 +0200
--- pgsql-partial/src/backend/utils/init/postinit.c	2010-08-12 15:12:06.000000000 +0200
*************** InitPostgres(const char *in_dbname, Oid 
*** 896,901 ****
--- 896,1162 ----
  		CommitTransactionCommand();
  }
  
+ /* --------------------------------
+  * InitPostgresForPartialReplication
+  *
+  * Dumbed down version of InitPostgres for a fixed database.
+  */
+ void
+ InitPostgresForPartialReplication(void)
+ {
+ 	bool		am_superuser;
+ 	GucContext	gucctx;
+ 	char	   *fullpath;
+ 	const char *dbname = "replication";
+ 	Form_pg_database dbform;
+ 	HeapTuple	tuple;
+ 
+ 	elog(DEBUG3, "InitPostgresForPartialReplication");
+ 
+ 	/*
+ 	 * Initialize my entry in the shared-invalidation manager's array of
+ 	 * per-backend data.
+ 	 *
+ 	 * Sets up MyBackendId, a unique backend identifier.
+ 	 */
+ 	MyBackendId = InvalidBackendId;
+ 
+ 	SharedInvalBackendInit(false);
+ 
+ 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
+ 		elog(FATAL, "bad backend id: %d", MyBackendId);
+ 
+ 	/* Now that we have a BackendId, we can participate in ProcSignal */
+ 	ProcSignalInit(MyBackendId);
+ 
+ 	/*
+ 	 * bufmgr needs another initialization call too
+ 	 */
+ 	InitBufferPoolBackend();
+ 
+ 	/*
+ 	 * Initialize the relation cache and the system catalog caches.  Note that
+ 	 * no catalog access happens here; we only set up the hashtable structure.
+ 	 * We must do this before starting a transaction because transaction abort
+ 	 * would try to touch these hashtables.
+ 	 */
+ 	RelationCacheInitialize();
+ 	InitCatalogCache();
+ 	InitPlanCache();
+ 
+ 	/* Initialize portal manager */
+ 	EnablePortalManager();
+ 
+ 	/* Initialize stats collection --- must happen before first xact */
+ 	pgstat_initialize();
+ 
+ 	/*
+ 	 * Load relcache entries for the shared system catalogs.  This must create
+ 	 * at least entries for pg_database and catalogs used for authentication.
+ 	 */
+ 	RelationCacheInitializePhase2();
+ 
+ 	/*
+ 	 * Set up process-exit callback to do pre-shutdown cleanup.  This has to
+ 	 * be after we've initialized all the low-level modules like the buffer
+ 	 * manager, because during shutdown this has to run before the low-level
+ 	 * modules start to close down.  On the other hand, we want it in place
+ 	 * before we begin our first transaction --- if we fail during the
+ 	 * initialization transaction, as is entirely possible, we need the
+ 	 * AbortTransaction call to clean up.
+ 	 */
+ 	on_shmem_exit(ShutdownPostgres, 0);
+ 
+ 	/*
+ 	 * Start a new transaction here before first access to db, and get a
+ 	 * snapshot.  We don't have a use for the snapshot itself, but we're
+ 	 * interested in the secondary effect that it sets RecentGlobalXmin. (This
+ 	 * is critical for anything that reads heap pages, because HOT may decide
+ 	 * to prune them even if the process doesn't attempt to modify any
+ 	 * tuples.)
+ 	 */
+ 	StartTransactionCommand();
+ 	(void) GetTransactionSnapshot();
+ 
+ 	InitializeSessionUserIdStandalone();
+ 	am_superuser = true;
+ 
+ 	tuple = GetDatabaseTuple(dbname);
+ 	if (!HeapTupleIsValid(tuple))
+ 		ereport(FATAL,
+ 					(errcode(ERRCODE_UNDEFINED_DATABASE),
+ 					 errmsg("database \"%s\" does not exist", dbname)));
+ 	dbform = (Form_pg_database) GETSTRUCT(tuple);
+ 	MyDatabaseId = HeapTupleGetOid(tuple);
+ 	MyDatabaseTableSpace = dbform->dattablespace;
+ 
+ 	/* Now we can mark our PGPROC entry with the database ID */
+ 	/* (We assume this is an atomic store so no lock is needed) */
+ 	MyProc->databaseId = MyDatabaseId;
+ 
+ 	/*
+ 	 * Now, take a writer's lock on the database we are trying to connect to.
+ 	 * If there is a concurrently running DROP DATABASE on that database, this
+ 	 * will block us until it finishes (and has committed its update of
+ 	 * pg_database).
+ 	 *
+ 	 * Note that the lock is not held long, only until the end of this startup
+ 	 * transaction.  This is OK since we are already advertising our use of
+ 	 * the database in the PGPROC array; anyone trying a DROP DATABASE after
+ 	 * this point will see us there.
+ 	 *
+ 	 * Note: use of RowExclusiveLock here is reasonable because we envision
+ 	 * our session as being a concurrent writer of the database.  If we had a
+ 	 * way of declaring a session as being guaranteed-read-only, we could use
+ 	 * AccessShareLock for such sessions and thereby not conflict against
+ 	 * CREATE DATABASE.
+ 	 */
+ 	LockSharedObject(DatabaseRelationId, MyDatabaseId, 0,
+ 						 RowExclusiveLock);
+ 
+ 	/*
+ 	 * Recheck pg_database to make sure the target database hasn't gone away.
+ 	 * If there was a concurrent DROP DATABASE, this ensures we will die
+ 	 * cleanly without creating a mess.
+ 	 */
+ 	tuple = GetDatabaseTuple(dbname);
+ 	if (!HeapTupleIsValid(tuple) ||
+ 		MyDatabaseId != HeapTupleGetOid(tuple) ||
+ 		MyDatabaseTableSpace != ((Form_pg_database) GETSTRUCT(tuple))->dattablespace)
+ 		ereport(FATAL,
+ 					(errcode(ERRCODE_UNDEFINED_DATABASE),
+ 					 errmsg("database \"%s\" does not exist", dbname),
+ 			   errdetail("It seems to have just been dropped or renamed.")));
+ 
+ 	/*
+ 	 * Now we should be able to access the database directory safely. Verify
+ 	 * it's there and looks reasonable.
+ 	 */
+ 	fullpath = GetDatabasePath(MyDatabaseId, MyDatabaseTableSpace);
+ 
+ 	if (access(fullpath, F_OK) == -1)
+ 	{
+ 		if (errno == ENOENT)
+ 			ereport(FATAL,
+ 						(errcode(ERRCODE_UNDEFINED_DATABASE),
+ 						 errmsg("database \"%s\" does not exist",
+ 								dbname),
+ 					errdetail("The database subdirectory \"%s\" is missing.",
+ 							  fullpath)));
+ 		else
+ 			ereport(FATAL,
+ 						(errcode_for_file_access(),
+ 						 errmsg("could not access directory \"%s\": %m",
+ 								fullpath)));
+ 	}
+ 
+ 	ValidatePgVersion(fullpath);
+ 
+ 	SetDatabasePath(fullpath);
+ 
+ 	/*
+ 	 * It's now possible to do real access to the system catalogs.
+ 	 *
+ 	 * Load relcache entries for the system catalogs.  This must create at
+ 	 * least the minimum set of "nailed-in" cache entries.
+ 	 */
+ 	RelationCacheInitializePhase3();
+ 
+ 	/* set up ACL framework (so CheckMyDatabase can check permissions) */
+ 	initialize_acl();
+ 
+ 	/*
+ 	 * Re-read the pg_database row for our database, check permissions and set
+ 	 * up database-specific GUC settings.  We can't do this until all the
+ 	 * database-access infrastructure is up.  (Also, it wants to know if the
+ 	 * user is a superuser, so the above stuff has to happen first.)
+ 	 */
+ 	CheckMyDatabase(dbname, am_superuser);
+ 
+ 	/*
+ 	 * Now process any command-line switches that were included in the startup
+ 	 * packet, if we are in a regular backend.	We couldn't do this before
+ 	 * because we didn't know if client is a superuser.
+ 	 */
+ 	gucctx = am_superuser ? PGC_SUSET : PGC_BACKEND;
+ 
+ 	if (MyProcPort != NULL &&
+ 		MyProcPort->cmdline_options != NULL)
+ 	{
+ 		/*
+ 		 * The maximum possible number of commandline arguments that could
+ 		 * come from MyProcPort->cmdline_options is (strlen + 1) / 2; see
+ 		 * pg_split_opts().
+ 		 */
+ 		char	  **av;
+ 		int			maxac;
+ 		int			ac;
+ 
+ 		maxac = 2 + (strlen(MyProcPort->cmdline_options) + 1) / 2;
+ 
+ 		av = (char **) palloc(maxac * sizeof(char *));
+ 		ac = 0;
+ 
+ 		av[ac++] = "postgres";
+ 
+ 		/* Note this mangles MyProcPort->cmdline_options */
+ 		pg_split_opts(av, &ac, MyProcPort->cmdline_options);
+ 
+ 		av[ac] = NULL;
+ 
+ 		Assert(ac < maxac);
+ 
+ 		(void) process_postgres_switches(ac, av, gucctx);
+ 	}
+ 
+ 	/*
+ 	 * Process any additional GUC variable settings passed in startup packet.
+ 	 * These are handled exactly like command-line variables.
+ 	 */
+ 	if (MyProcPort != NULL)
+ 	{
+ 		ListCell   *gucopts = list_head(MyProcPort->guc_options);
+ 
+ 		while (gucopts)
+ 		{
+ 			char	   *name;
+ 			char	   *value;
+ 
+ 			name = lfirst(gucopts);
+ 			gucopts = lnext(gucopts);
+ 
+ 			value = lfirst(gucopts);
+ 			gucopts = lnext(gucopts);
+ 
+ 			SetConfigOption(name, value, gucctx, PGC_S_CLIENT);
+ 		}
+ 	}
+ 
+ 	/* Process pg_db_role_setting options */
+ 	process_settings(MyDatabaseId, GetSessionUserId());
+ 
+ 	/* Apply PostAuthDelay as soon as we've read all options */
+ 	if (PostAuthDelay > 0)
+ 		pg_usleep(PostAuthDelay * 1000000L);
+ 
+ 	/*
+ 	 * Initialize various default states that can't be set up until we've
+ 	 * selected the active user and gotten the right GUC settings.
+ 	 */
+ 
+ 	/* set default namespace search path */
+ 	InitializeSearchPath();
+ 
+ 	/* initialize client encoding */
+ 	InitializeClientEncoding();
+ 
+ 	/* report this backend in the PgBackendStatus array */
+ 	pgstat_bestart();
+ 
+ 	/* close the transaction we started above */
+ 	CommitTransactionCommand();
+ }
+ 
  /*
   * Load GUC settings from pg_db_role_setting.
   *
diff -dcrpN pgsql/src/bin/initdb/initdb.c pgsql-partial/src/bin/initdb/initdb.c
*** pgsql/src/bin/initdb/initdb.c	2010-02-26 03:01:15.000000000 +0100
--- pgsql-partial/src/bin/initdb/initdb.c	2010-08-12 15:12:59.000000000 +0200
*************** static void load_plpgsql(void);
*** 179,184 ****
--- 179,185 ----
  static void vacuum_db(void);
  static void make_template0(void);
  static void make_postgres(void);
+ static void make_replication(void);
  static void trapsig(int signum);
  static void check_ok(void);
  static char *escape_quotes(const char *src);
*************** make_postgres(void)
*** 1993,1998 ****
--- 1994,2030 ----
  	check_ok();
  }
  
+ /*
+  * copy template1 to replication
+  */
+ static void
+ make_replication(void)
+ {
+ 	PG_CMD_DECL;
+ 	const char **line;
+ 	static const char *template0_setup[] = {
+ 		"CREATE DATABASE replication;\n",
+ 		NULL
+ 	};
+ 
+ 	fputs(_("copying template1 to replication ... "), stdout);
+ 	fflush(stdout);
+ 
+ 	snprintf(cmd, sizeof(cmd),
+ 			 "\"%s\" %s template1 >%s",
+ 			 backend_exec, backend_options,
+ 			 DEVNULL);
+ 
+ 	PG_CMD_OPEN;
+ 
+ 	for (line = template0_setup; *line; line++)
+ 		PG_CMD_PUTS(*line);
+ 
+ 	PG_CMD_CLOSE;
+ 
+ 	check_ok();
+ }
+ 
  
  /*
   * signal handler in case we are interrupted.
*************** main(int argc, char *argv[])
*** 3125,3130 ****
--- 3157,3164 ----
  
  	make_postgres();
  
+ 	make_replication();
+ 
  	if (authwarning != NULL)
  		fprintf(stderr, "%s", authwarning);
  
diff -dcrpN pgsql/src/include/access/xlog.h pgsql-partial/src/include/access/xlog.h
*** pgsql/src/include/access/xlog.h	2010-08-11 07:19:26.000000000 +0200
--- pgsql-partial/src/include/access/xlog.h	2010-08-11 07:39:21.000000000 +0200
*************** extern PGDLLIMPORT TimeLineID ThisTimeLi
*** 145,150 ****
--- 145,158 ----
  extern bool InRecovery;
  
  /*
+  * The standby replays the replication class indicated below
+  * by the name. The OID is determined by and valid only in the
+  * startup process.
+  */
+ extern char	   *standby_replica_classname;
+ extern Oid 		standby_replica_classoid;
+ 
+ /*
   * Like InRecovery, standbyState is only valid in the startup process.
   * In all other processes it will have the value STANDBY_DISABLED (so
   * InHotStandby will read as FALSE).
diff -dcrpN pgsql/src/include/catalog/indexing.h pgsql-partial/src/include/catalog/indexing.h
*** pgsql/src/include/catalog/indexing.h	2010-02-26 03:01:21.000000000 +0100
--- pgsql-partial/src/include/catalog/indexing.h	2010-08-11 07:39:21.000000000 +0200
*************** DECLARE_UNIQUE_INDEX(pg_default_acl_oid_
*** 281,286 ****
--- 281,295 ----
  DECLARE_UNIQUE_INDEX(pg_db_role_setting_databaseid_rol_index, 2965, on pg_db_role_setting using btree(setdatabase oid_ops, setrole oid_ops));
  #define DbRoleSettingDatidRolidIndexId	2965
  
+ DECLARE_UNIQUE_INDEX(pg_replica_oid_index, 3597, on pg_replica using btree (oid oid_ops));
+ #define	ReplicaOidIndexId	3597
+ 
+ DECLARE_UNIQUE_INDEX(pg_replica_classname_index, 3598, on pg_replica using btree(rpl_classname name_ops));
+ #define ReplicaClassnameIndexId	3598
+ 
+ DECLARE_UNIQUE_INDEX(pg_replicaitem_index, 3599, on pg_replicaitem using btree(rpl_classoid oid_ops, rpl_dboid oid_ops, rpl_reloid oid_ops));
+ #define ReplicaItemIndexId	3599
+ 
  /* last step of initialization script: build the indexes declared above */
  BUILD_INDICES
  
diff -dcrpN pgsql/src/include/catalog/pg_replica_fn.h pgsql-partial/src/include/catalog/pg_replica_fn.h
*** pgsql/src/include/catalog/pg_replica_fn.h	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/include/catalog/pg_replica_fn.h	2010-08-12 19:57:20.000000000 +0200
***************
*** 0 ****
--- 1,25 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_replica_fn.h
+  *	support functions for the system replica class
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL: $
+  *
+  * NOTES
+  *	the genbki.pl script reads this file and generates .bki
+  *	information from the DATA() statements.
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef PG_REPLICA_FN_H
+ #define PG_REPLICA_FN_H
+ 
+ extern Oid	get_replica_class_oid(const char *classname);
+ extern bool	replica_item_exists(Oid classoid, Oid dboid, Oid reloid);
+ extern bool	replica_item_exists2(Relation pg_replicaitem_rel,
+ 				    Oid classoid, Oid dboid, Oid reloid);
+ 
+ #endif   /* PG_REPLICA_FN_H */
diff -dcrpN pgsql/src/include/catalog/pg_replica.h pgsql-partial/src/include/catalog/pg_replica.h
*** pgsql/src/include/catalog/pg_replica.h	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/include/catalog/pg_replica.h	2010-08-12 20:10:48.000000000 +0200
***************
*** 0 ****
--- 1,51 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_replica.h
+  *	definition of the system "replica class" relation (pg_replica)
+  *	along with the relation's initial contents.
+  *
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL: $
+  *
+  * NOTES
+  *	the genbki.pl script reads this file and generates .bki
+  *	information from the DATA() statements.
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef PG_REPLICA_H
+ #define PG_REPLICA_H
+ 
+ #include "catalog/genbki.h"
+ 
+ /* ----------------
+  *		pg_replica_class definition.  cpp turns this into
+  *		typedef struct FormData_pg_replica_class
+  * ----------------
+  */
+ #define ReplicaRelationId		3051
+ 
+ CATALOG(pg_replica,3051) BKI_SHARED_RELATION BKI_ROWTYPE_OID(3052) BKI_SCHEMA_MACRO
+ {
+ 	NameData	rpl_classname;		/* replica class name */
+ } FormData_pg_replica;
+ 
+ /* ----------------
+  *		Form_pg_class corresponds to a pointer to a tuple with
+  *		the format of pg_class relation.
+  * ----------------
+  */
+ typedef FormData_pg_replica *Form_pg_replica;
+ 
+ /* ----------------
+  *		compiler constants for pg_replica
+  * ----------------
+  */
+    
+ #define Natts_pg_replica				1
+ #define Anum_pg_replica_classname			1
+ 
+ #endif   /* PG_REPLICA_H */
diff -dcrpN pgsql/src/include/catalog/pg_replicaitem.h pgsql-partial/src/include/catalog/pg_replicaitem.h
*** pgsql/src/include/catalog/pg_replicaitem.h	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/include/catalog/pg_replicaitem.h	2010-08-12 20:10:55.000000000 +0200
***************
*** 0 ****
--- 1,55 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_replicaitem.h
+  *	definition of the system "replica item" relation (pg_replicaitem)
+  *	along with the relation's initial contents.
+  *
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL: $
+  *
+  * NOTES
+  *	the genbki.pl script reads this file and generates .bki
+  *	information from the DATA() statements.
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef PG_REPLICAITEM_H
+ #define PG_REPLICAITEM_H
+ 
+ #include "catalog/genbki.h"
+ 
+ /* ----------------
+  *		pg_replica_class definition.  cpp turns this into
+  *		typedef struct FormData_pg_replica_class
+  * ----------------
+  */
+ #define ReplicaItemRelationId		3053
+ 
+ CATALOG(pg_replicaitem,3053) BKI_SHARED_RELATION BKI_ROWTYPE_OID(3054) BKI_SCHEMA_MACRO BKI_WITHOUT_OIDS
+ {
+ 	Oid	rpl_classoid;		/* replica class oid */
+ 	Oid	rpl_dboid;		/* database oid */
+ 	Oid	rpl_reloid;		/* relation oid */
+ } FormData_pg_replicaitem;
+ 
+ /* ----------------
+  *		Form_pg_class corresponds to a pointer to a tuple with
+  *		the format of pg_class relation.
+  * ----------------
+  */
+ typedef FormData_pg_replicaitem *Form_pg_replicaitem;
+ 
+ /* ----------------
+  *		compiler constants for pg_replicaitem
+  * ----------------
+  */
+    
+ #define Natts_pg_replicaitem					3
+ #define Anum_pg_replicaitem_classoid			1
+ #define Anum_pg_replicaitem_dboid			2
+ #define Anum_pg_replicaitem_reloid			3
+ 
+ #endif   /* PG_REPLICAITEM_H */
diff -dcrpN pgsql/src/include/commands/replica.h pgsql-partial/src/include/commands/replica.h
*** pgsql/src/include/commands/replica.h	1970-01-01 01:00:00.000000000 +0100
--- pgsql-partial/src/include/commands/replica.h	2010-08-11 07:39:21.000000000 +0200
***************
*** 0 ****
--- 1,21 ----
+ /*-------------------------------------------------------------------------
+  *
+  * replica.h
+  *		Partial Replication Class management commands (create/alter replica class).
+  *
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef REPLICA_H
+ #define REPLICA_H
+ 
+ #include "nodes/parsenodes.h"
+ 
+ extern void CreateReplicaClass(CreateReplicaStmt *stmt);
+ extern void AlterReplicaClass(AlterReplicaStmt *stmt);
+ 
+ #endif   /* TABLESPACE_H */
diff -dcrpN pgsql/src/include/miscadmin.h pgsql-partial/src/include/miscadmin.h
*** pgsql/src/include/miscadmin.h	2010-06-20 13:59:16.000000000 +0200
--- pgsql-partial/src/include/miscadmin.h	2010-08-12 14:22:45.000000000 +0200
*************** extern ProcessingMode Mode;
*** 340,345 ****
--- 340,346 ----
  extern void pg_split_opts(char **argv, int *argcp, char *optstr);
  extern void InitPostgres(const char *in_dbname, Oid dboid, const char *username,
  			 char *out_dbname);
+ extern void InitPostgresForPartialReplication(void);
  extern void BaseInit(void);
  
  /* in utils/init/miscinit.c */
diff -dcrpN pgsql/src/include/nodes/nodes.h pgsql-partial/src/include/nodes/nodes.h
*** pgsql/src/include/nodes/nodes.h	2010-07-13 10:51:07.000000000 +0200
--- pgsql-partial/src/include/nodes/nodes.h	2010-08-11 07:39:21.000000000 +0200
*************** typedef enum NodeTag
*** 347,352 ****
--- 347,355 ----
  	T_AlterUserMappingStmt,
  	T_DropUserMappingStmt,
  	T_AlterTableSpaceOptionsStmt,
+ 	T_ReplicaElem,
+ 	T_CreateReplicaStmt,
+ 	T_AlterReplicaStmt,
  
  	/*
  	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff -dcrpN pgsql/src/include/nodes/parsenodes.h pgsql-partial/src/include/nodes/parsenodes.h
*** pgsql/src/include/nodes/parsenodes.h	2010-08-07 10:40:13.000000000 +0200
--- pgsql-partial/src/include/nodes/parsenodes.h	2010-08-11 07:39:21.000000000 +0200
*************** typedef struct VariableShowStmt
*** 1356,1361 ****
--- 1356,1389 ----
  } VariableShowStmt;
  
  /* ----------------------
+  *		Create Replica Class Statement
+  * ----------------------
+  */
+ 
+ typedef struct ReplicaElem
+ {
+ 	NodeTag		type;
+ 	char		kind;			/* 'd' for database, 'r' for relation */
+ 	bool		excluded;		/* included/excluded */
+ 	char	   *dbname;			/* database name */
+ 	RangeVar   *range;			/* relation */
+ } ReplicaElem;
+ 
+ typedef struct CreateReplicaStmt
+ {
+ 	NodeTag		type;
+ 	char	   *name;
+ 	List	   *excluded_objects;
+ } CreateReplicaStmt;
+ 
+ typedef struct AlterReplicaStmt
+ {
+ 	NodeTag		type;
+ 	char	   *name;
+ 	List	   *modified_objects;
+ } AlterReplicaStmt;
+ 
+ /* ----------------------
   *		Create Table Statement
   *
   * NOTE: in the raw gram.y output, ColumnDef and Constraint nodes are
diff -dcrpN pgsql/src/include/parser/kwlist.h pgsql-partial/src/include/parser/kwlist.h
*** pgsql/src/include/parser/kwlist.h	2010-08-07 10:40:13.000000000 +0200
--- pgsql-partial/src/include/parser/kwlist.h	2010-08-11 07:39:21.000000000 +0200
*************** PG_KEYWORD("recursive", RECURSIVE, UNRES
*** 305,310 ****
--- 305,311 ----
  PG_KEYWORD("ref", REF, UNRESERVED_KEYWORD)
  PG_KEYWORD("references", REFERENCES, RESERVED_KEYWORD)
  PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
+ PG_KEYWORD("relation", RELATION, UNRESERVED_KEYWORD)
  PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
  PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
  PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
diff -dcrpN pgsql/src/include/utils/syscache.h pgsql-partial/src/include/utils/syscache.h
*** pgsql/src/include/utils/syscache.h	2010-02-14 19:42:18.000000000 +0100
--- pgsql-partial/src/include/utils/syscache.h	2010-08-11 07:39:21.000000000 +0200
*************** enum SysCacheIdentifier
*** 69,74 ****
--- 69,76 ----
  	PROCOID,
  	RELNAMENSP,
  	RELOID,
+ 	REPLICAITEMTRIPLET,
+ 	REPLICANAME,
  	RULERELNAME,
  	STATRELATTINH,
  	TABLESPACEOID,
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Boszormenyi Zoltan (#1)
Re: WIP partial replication patch

Boszormenyi Zoltan <zb@cybertec.at> writes:

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery. Another rather basic problem is that you've
got to pass system catalog updates downstream (in case they affect
the tables being replicated) but if you want partial replication then
many of those updates will be incorrect for the slave machine.

More generally, though, we are going to have our hands full for the
foreseeable future trying to get the existing style of replication
bug-free and performant. I don't think we want to undertake any large
expansion of the replication feature set, at least not for some time
to come. So you can count on me to vote against committing anything
like this into core.

regards, tom lane

#3Boszormenyi Zoltan
zb@cybertec.at
In reply to: Tom Lane (#2)
Re: WIP partial replication patch

Tom Lane �rta:

Boszormenyi Zoltan <zb@cybertec.at> writes:

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery.

Just like Hot Standby, no? What is the difference here?
Sorry for being ignorant.

Another rather basic problem is that you've
got to pass system catalog updates downstream (in case they affect
the tables being replicated) but if you want partial replication then
many of those updates will be incorrect for the slave machine.

Yes, it's true. But there's an easy solution to that, querying
such tables can be forbidden, we were talking about truncating
such excluded relations internally. Currently querying exluded
tables are allowed just to be able to see that DML indeed doesn't
modify them. As I said, ATM it's only a proof of concept patch.

More generally, though, we are going to have our hands full for the
foreseeable future trying to get the existing style of replication
bug-free and performant. I don't think we want to undertake any large
expansion of the replication feature set, at least not for some time
to come. So you can count on me to vote against committing anything
like this into core.

Understood.

Best regards,
Zolt�n B�sz�rm�nyi

#4Andres Freund
andres@anarazel.de
In reply to: Boszormenyi Zoltan (#3)
Re: WIP partial replication patch

On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:

Tom Lane �rta:

Boszormenyi Zoltan <zb@cybertec.at> writes:

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery.

Just like Hot Standby, no? What is the difference here?
Sorry for being ignorant.

In HS you can only connect after youve found a restartpoint - only
after that you know that you have reached a consistent point for the
system.

I think this is fixable by keeping more wal on the standby's but I
need to think more about it.

Andres

#5Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#2)
Re: WIP partial replication patch

Another rather basic problem is that you've
got to pass system catalog updates downstream (in case they affect
the tables being replicated) but if you want partial replication then
many of those updates will be incorrect for the slave machine.

Couldn't this be taken care of by replicating the objects but not any
data for them? That is, the tables and indexes would exist, but be empty?

More generally, though, we are going to have our hands full for the
foreseeable future trying to get the existing style of replication
bug-free and performant. I don't think we want to undertake any large
expansion of the replication feature set, at least not for some time
to come. So you can count on me to vote against committing anything
like this into core.

I imagine it'll take more than a year to get this to work, if we ever
do. Probably good to put it on a git branch and that way those who want
to can continue long-term work on it.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#5)
Re: WIP partial replication patch

Josh Berkus <josh@agliodbs.com> writes:

Another rather basic problem is that you've
got to pass system catalog updates downstream (in case they affect
the tables being replicated) but if you want partial replication then
many of those updates will be incorrect for the slave machine.

Couldn't this be taken care of by replicating the objects but not any
data for them? That is, the tables and indexes would exist, but be empty?

Seems a bit pointless. What exactly is the use-case for a slave whose
system catalogs match the master exactly (as they must) but whose data
does not?

Notice also that you have to shove the entire WAL downstream anyway ---
the proposed patch filters at the point of application, and would have a
hard time doing better because LSNs have to remain consistent.

It would also be rather tricky to identify which objects have to have
updates applied, eg, if you replicate a table you'd damn well better
replicate the data for each and every one of its indexes (which is a
non-constant set in general), because queries on the slave will expect
them all to be valid. Maybe it's possible to keep track of that, though
I bet things will be tricky when there are uncommitted DDL changes
(consider data changes associated with a CREATE INDEX on a replicated
table). In any case xlog replay functions are not the place to have
that kind of logic.

regards, tom lane

#7Boszormenyi Zoltan
zb@cybertec.at
In reply to: Andres Freund (#4)
Re: WIP partial replication patch

Andres Freund �rta:

On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:

Tom Lane �rta:

Boszormenyi Zoltan <zb@cybertec.at> writes:

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery.

Just like Hot Standby, no? What is the difference here?
Sorry for being ignorant.

In HS you can only connect after youve found a restartpoint - only
after that you know that you have reached a consistent point for the
system.

And in this patch, the startup process only tries to connect
after signalling the postmaster that a consistent state is reached.
And the connection has a reasonable timeout built in.

I think this is fixable by keeping more wal on the standby's but I
need to think more about it.

Andres

Best regards,
Zolt�n B�sz�rm�nyi

#8Andres Freund
andres@anarazel.de
In reply to: Boszormenyi Zoltan (#7)
Re: WIP partial replication patch

On Sat, Aug 14, 2010 at 08:40:24AM +0200, Boszormenyi Zoltan wrote:

Andres Freund �rta:

On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:

Tom Lane �rta:

Boszormenyi Zoltan <zb@cybertec.at> writes:

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery.

Just like Hot Standby, no? What is the difference here?
Sorry for being ignorant.

In HS you can only connect after youve found a restartpoint - only
after that you know that you have reached a consistent point for the
system.

And in this patch, the startup process only tries to connect
after signalling the postmaster that a consistent state is reached.
And the connection has a reasonable timeout built in.

I don't think you currently can guarantee you allways have enough local WAL to even reach
a consistent point. Which is not a problem of your patch, dont get me
wrong...

Andres

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: WIP partial replication patch

Andres Freund <andres@anarazel.de> writes:

On Sat, Aug 14, 2010 at 08:40:24AM +0200, Boszormenyi Zoltan wrote:

And in this patch, the startup process only tries to connect
after signalling the postmaster that a consistent state is reached.
And the connection has a reasonable timeout built in.

I don't think you currently can guarantee you allways have enough
local WAL to even reach a consistent point.

Even if you do, the patch will malfunction (and perhaps corrupt the
database) while reading that WAL. Yes, it'd work once you reach a
consistent database state, but bootstrapping a slave into that
condition will be far more painful than it is with the current
replication code.

regards, tom lane