Hot Standby on git

Started by Simon Riggsover 16 years ago48 messages
#1Simon Riggs
simon@2ndQuadrant.com
1 attachment(s)

Just a note to say that Hot Standby patch is now on git repository
git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby

The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.

Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions. I request anybody else interested to do the same to allow
us to work together. All contributions welcome.

My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items

You'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests.

I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.

Many thanks to Magnus and Aidan for helping me resolve my git-wrestling
contest and apologies for the delay while that bout happened.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

hs.53c1eda..13a0bd6.patchtext/x-patch; charset=UTF-8; name=hs.53c1eda..13a0bd6.patchDownload
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1934,1941 **** if (!triggered)
     </para>
  
     <para>
! 	Read-only here means "no writes to the permanent database tables". So
! 	there are no problems with queries that make use of temporary sort and
  	work files will be used.  Temporary tables cannot be created and
  	therefore cannot be used at all in recovery mode.
     </para>
--- 1934,1941 ----
     </para>
  
     <para>
! 	Read-only here means "no writes to the permanent database tables". 
! 	There are no problems with queries that make use of temporary sort and
  	work files will be used.  Temporary tables cannot be created and
  	therefore cannot be used at all in recovery mode.
     </para>
***************
*** 1983,1989 **** if (!triggered)
       </listitem>
  	 <listitem>
  	  <para>
!        LOCK, with restrictions, see later
        </para>
       </listitem>
  	 <listitem>
--- 1983,1989 ----
       </listitem>
  	 <listitem>
  	  <para>
!        LOCK TABLE, though only when explicitly IN ACCESS SHARE MODE
        </para>
       </listitem>
  	 <listitem>
***************
*** 2000,2014 **** if (!triggered)
     </para>
  
     <para>
! 	These actions will produce error messages
  
  	<itemizedlist>
  	 <listitem>
  	  <para>
!        DML - Insert, Update, Delete, COPY FROM, Truncate which all write data. 
! 	   Any RULE which generates DML will throw error messages as a result.
! 	   Note that there is no action possible that can result in a trigger
! 	   being executed.
        </para>
       </listitem>
  	 <listitem>
--- 2000,2013 ----
     </para>
  
     <para>
! 	These actions produce error messages
  
  	<itemizedlist>
  	 <listitem>
  	  <para>
!        DML - Insert, Update, Delete, COPY FROM, Truncate. 
! 	   Note that there are no actions that result in a trigger
! 	   being executed during recovery.
        </para>
       </listitem>
  	 <listitem>
***************
*** 2024,2029 **** if (!triggered)
--- 2023,2041 ----
       </listitem>
  	 <listitem>
  	  <para>
+        RULEs on SELECT statements that generate DML commands. RULEs on DML
+ 	   commands that produce only SELECT statements are already disallowed
+ 	   during read-only transactions.
+       </para>
+      </listitem>
+ 	 <listitem>
+ 	  <para>
+        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+        LOCK TABLE that explicitly requests a lock other than ACCESS SHARE MODE.
+       </para>
+      </listitem>
+ 	 <listitem>
+ 	  <para>
         Transaction management commands that explicitly set non-read only state
  		<itemizedlist>
  		 <listitem>
***************
*** 2069,2077 **** if (!triggered)
      
     <para>
  	Note that current behaviour of read only transactions when not in
! 	recovery is to allow the last two actions, so there is a small and
! 	subtle difference in behaviour between standby read-only transactions
! 	and read only transactions during normal running.
  	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
  	temporary tables may be lifted in a future release, if their internal
  	implementation is altered to make this possible.
--- 2081,2089 ----
      
     <para>
  	Note that current behaviour of read only transactions when not in
! 	recovery is to allow the last two actions, so there are small and
! 	subtle differences in behaviour between read-only transactions
! 	run on standby and during normal running.
  	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
  	temporary tables may be lifted in a future release, if their internal
  	implementation is altered to make this possible.
***************
*** 2082,2088 **** if (!triggered)
  	processing mode. Sessions will remain connected while the server
  	changes mode. Current transactions will continue, though will remain
  	read-only. After this, it will be possible to initiate read-write
! 	transactions, though users must *manually* reset their 
  	default_transaction_read_only setting first, if they want that
  	behaviour.
     </para>
--- 2094,2100 ----
  	processing mode. Sessions will remain connected while the server
  	changes mode. Current transactions will continue, though will remain
  	read-only. After this, it will be possible to initiate read-write
! 	transactions, though users must explicitly reset their 
  	default_transaction_read_only setting first, if they want that
  	behaviour.
     </para>
***************
*** 2098,2107 **** if (!triggered)
     </para>
  
     <para>
! 	In recovery, transactions will not be permitted to take any lock higher
! 	other than AccessShareLock or AccessExclusiveLock. In addition,
! 	transactions may never assign a TransactionId and may never write WAL.
! 	The LOCK TABLE command by default applies an AccessExclusiveLock. 
  	Any LOCK TABLE command that runs on the standby and requests a specific
  	lock type other than AccessShareLock will be rejected.
     </para>
--- 2110,2118 ----
     </para>
  
     <para>
! 	In recovery, transactions will not be permitted to take any table lock
! 	higher than AccessShareLock. In addition, transactions may never assign
! 	a TransactionId and may never write WAL. 
  	Any LOCK TABLE command that runs on the standby and requests a specific
  	lock type other than AccessShareLock will be rejected.
     </para>
***************
*** 2168,2175 **** if (!triggered)
  
     <para>
  	An example of the above would be an Administrator on Primary server
! 	runs a DROP TABLE command that refers to a table currently in use by
! 	a User query on the standby server.
     </para>
  
     <para>
--- 2179,2186 ----
  
     <para>
  	An example of the above would be an Administrator on Primary server
! 	runs a DROP TABLE command on a table that's currently being queried
! 	in the standby server.
     </para>
  
     <para>
***************
*** 2198,2206 **** if (!triggered)
     <para>
  	We have a number of choices for resolving query conflicts.  The default
  	is that we wait and hope the query completes. If the recovery is not paused,
! 	then the server will wait automatically until the server the lag between
  	primary and standby is at most max_standby_delay seconds. Once that grace
! 	period expires, we then take one of the following actions:
  
  	  <itemizedlist>
  	   <listitem>
--- 2209,2217 ----
     <para>
  	We have a number of choices for resolving query conflicts.  The default
  	is that we wait and hope the query completes. If the recovery is not paused,
! 	then the server will wait automatically until the lag between
  	primary and standby is at most max_standby_delay seconds. Once that grace
! 	period expires, we take one of the following actions:
  
  	  <itemizedlist>
  	   <listitem>
***************
*** 2213,2219 **** if (!triggered)
  	    <para>
  		 If the conflict is caused by cleanup records we tell the standby query
  	 	 that a conflict has occurred and that it must cancel itself to avoid the
! 	 	 risk that it attempts to silently fails to read relevant data because
  	 	 that data has been removed. (This is very similar to the much feared
  		 error message "snapshot too old").
  	    </para>
--- 2224,2230 ----
  	    <para>
  		 If the conflict is caused by cleanup records we tell the standby query
  	 	 that a conflict has occurred and that it must cancel itself to avoid the
! 	 	 risk that it silently fails to read relevant data because
  	 	 that data has been removed. (This is very similar to the much feared
  		 error message "snapshot too old").
  	    </para>
***************
*** 2222,2228 **** if (!triggered)
  		 Note also that this means that idle-in-transaction sessions are never
  		 canceled except by locks. Users should be clear that tables that are
  		 regularly and heavily updated on primary server will quickly cause
! 		 cancellation of any longer running queries made against those tables.
  	    </para>
  
  	    <para>
--- 2233,2239 ----
  		 Note also that this means that idle-in-transaction sessions are never
  		 canceled except by locks. Users should be clear that tables that are
  		 regularly and heavily updated on primary server will quickly cause
! 		 cancellation of any longer running queries in the standby.
  	    </para>
  
  	    <para>
***************
*** 2235,2241 **** if (!triggered)
     </para>
  
     <para>
! 	Other remdial actions exist if the number of cancelations is unacceptable.
  	The first option is to connect to primary server and keep a query active
  	for as long as we need to run queries on the standby. This guarantees that
  	a WAL cleanup record is never generated and we don't ever get query
--- 2246,2252 ----
     </para>
  
     <para>
! 	Other remedial actions exist if the number of cancelations is unacceptable.
  	The first option is to connect to primary server and keep a query active
  	for as long as we need to run queries on the standby. This guarantees that
  	a WAL cleanup record is never generated and we don't ever get query
***************
*** 2283,2289 **** if (!triggered)
     <title>Administrator's Overview</title>
  
     <para>
! 	If there is a recovery.conf file present then the will start in Hot Standby
  	mode by default, though this can be disabled by setting
  	"recovery_connections = off" in recovery.conf. The server may take some
  	time to enable recovery connections since the server must first complete
--- 2294,2300 ----
     <title>Administrator's Overview</title>
  
     <para>
! 	If there is a recovery.conf file present the server will start in Hot Standby
  	mode by default, though this can be disabled by setting
  	"recovery_connections = off" in recovery.conf. The server may take some
  	time to enable recovery connections since the server must first complete
***************
*** 2308,2314 **** LOG:  database system is ready to accept read only connections
  	The setting of max_connections on the standby should be equal to or
  	greater than the setting of max_connections on the primary. This is to
  	ensure that standby has sufficient resources to manage incoming
! 	transactions.
     </para>
  
     <para>
--- 2319,2325 ----
  	The setting of max_connections on the standby should be equal to or
  	greater than the setting of max_connections on the primary. This is to
  	ensure that standby has sufficient resources to manage incoming
! 	transactions. max_prepared_transactions already has this restriction.
     </para>
  
     <para>
***************
*** 2329,2335 **** LOG:  database system is ready to accept read only connections
  	A set of functions allow superusers to control the flow of recovery
  	are described in <xref linkend="functions-recovery-control-table">.
  	These functions allow you to pause and continue recovery, as well
! 	as dynamically set new recovery targets wile recovery progresses.
  	Note that when a server is paused the apparent delay between primary
  	and standby will continue to increase.
     </para>
--- 2340,2346 ----
  	A set of functions allow superusers to control the flow of recovery
  	are described in <xref linkend="functions-recovery-control-table">.
  	These functions allow you to pause and continue recovery, as well
! 	as dynamically set new recovery targets while recovery progresses.
  	Note that when a server is paused the apparent delay between primary
  	and standby will continue to increase.
     </para>
***************
*** 2342,2348 **** LOG:  database system is ready to accept read only connections
  	themselves.  Users will be able to write large sort temp files and
  	re-generate relcache info files, so there is no part of the database
  	that is truly read-only during hot standby mode. There is no restriction
! 	on use of set returning functions, or other users of tuplestore/tuplesort
  	code. Note also that writes to remote databases will still be possible,
  	even though the transaction is read-only locally.
     </para>
--- 2353,2359 ----
  	themselves.  Users will be able to write large sort temp files and
  	re-generate relcache info files, so there is no part of the database
  	that is truly read-only during hot standby mode. There is no restriction
! 	on the use of set returning functions, or other users of tuplestore/tuplesort
  	code. Note also that writes to remote databases will still be possible,
  	even though the transaction is read-only locally.
     </para>
***************
*** 2354,2360 **** LOG:  database system is ready to accept read only connections
     </para>
  
     <para>
! 	The following types of administrator command will not be accepted
  	during recovery mode
  
  	  <itemizedlist>
--- 2365,2371 ----
     </para>
  
     <para>
! 	The following types of administrator command are not be accepted
  	during recovery mode
  
  	  <itemizedlist>
***************
*** 2558,2563 **** LOG:  database system is ready to accept read only connections
--- 2569,2583 ----
  	 available for use when running queries during recovery.
      </para>
     </listitem>
+    <listitem>
+     <para>
+      Full knowledge of running transactions is required before snapshots
+ 	 may be taken. Transactions that take use large numbers of subtransactions
+ 	 (currently greater than 64) will delay the start of read only
+ 	 connections until the completion of the longest running write transaction.
+ 	 If this situation occurs explanatory messages will be sent to server log.
+     </para>
+    </listitem>
    </itemizedlist>
  
     </para>
*** a/src/backend/access/gin/ginxlog.c
--- b/src/backend/access/gin/ginxlog.c
***************
*** 622,628 **** gin_redo(XLogRecPtr lsn, XLogRecord *record)
  	uint8		info = record->xl_info & ~XLR_INFO_MASK;
  
  	/*
! 	 * GIN indexes do not require any conflict processing. XXX really?
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
--- 622,630 ----
  	uint8		info = record->xl_info & ~XLR_INFO_MASK;
  
  	/*
! 	 * GIN indexes do not require any conflict processing. The GIN
! 	 * posting tree is scanned in logical order during VACUUM and
! 	 * no additional processing is required.
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
*** a/src/backend/access/gist/gistxlog.c
--- b/src/backend/access/gist/gistxlog.c
***************
*** 397,403 **** gist_redo(XLogRecPtr lsn, XLogRecord *record)
  	MemoryContext oldCxt;
  
  	/*
! 	 * GIST indexes do not require any conflict processing. XXX really?
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
--- 397,406 ----
  	MemoryContext oldCxt;
  
  	/*
! 	 * GIST indexes do not require any conflict processing. This is
! 	 * because GIST does not remove killed tuples when it performs
! 	 * page splits in the same way b-trees do. Also VACUUMs of 
! 	 * GIST indexes occur in logical not physical order.
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 947,952 **** begin:;
--- 947,971 ----
  	FIN_CRC32(rdata_crc);
  	record->xl_crc = rdata_crc;
  
+ #ifdef WAL_DEBUG
+ 	if (XLOG_DEBUG)
+ 	{
+ 		StringInfoData buf;
+ 
+ 		initStringInfo(&buf);
+ 		appendStringInfo(&buf, "INSERT @ %X/%X: ",
+ 						 RecPtr.xlogid, RecPtr.xrecoff);
+ 		xlog_outrec(&buf, record);
+ 		if (rdata->data != NULL)
+ 		{
+ 			appendStringInfo(&buf, " - ");
+ 			RmgrTable[record->xl_rmid].rm_desc(&buf, record->xl_info, rdata->data);
+ 		}
+ 		elog(LOG, "%s", buf.data);
+ 		pfree(buf.data);
+ 	}
+ #endif
+ 
  	/* Record begin of record in appropriate places */
  	ProcLastRecPtr = RecPtr;
  	Insert->PrevRecord = RecPtr;
*** a/src/backend/commands/lockcmds.c
--- b/src/backend/commands/lockcmds.c
***************
*** 49,61 **** LockTableCommand(LockStmt *lockstmt)
  
  		/*
  		 * During recovery we only accept these variations:
! 		 *
! 		 * LOCK TABLE foo       -- implicitly, AccessExclusiveLock
! 		 * LOCK TABLE foo IN ACCESS SHARE MODE
! 		 * LOCK TABLE foo IN ACCESS EXCLUSIVE MODE
  		 */
! 		if (lockstmt->mode != AccessShareLock
! 			&& lockstmt->mode != AccessExclusiveLock)
  			PreventCommandDuringRecovery();
  
  		LockTableRecurse(reloid, relation,
--- 49,57 ----
  
  		/*
  		 * During recovery we only accept these variations:
! 		 * LOCK TABLE foo IN ACCESS SHARE MODE which is effectively a no-op
  		 */
! 		if (lockstmt->mode != AccessShareLock)
  			PreventCommandDuringRecovery();
  
  		LockTableRecurse(reloid, relation,
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 502,509 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  	if (!xlrec->subxid_overflow)
  		recoverySnapshotValid = true;
  	else
! 		elog(trace_recovery(DEBUG2), 
! 				"running xact data has incomplete subtransaction data");
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
--- 502,509 ----
  	if (!xlrec->subxid_overflow)
  		recoverySnapshotValid = true;
  	else
! 		ereport(LOG, 
! 				(errmsg("consistent state delayed because recovery snapshot incomplete")));
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
***************
*** 1502,1508 **** HaveTransactionsInCommit(TransactionId *xids, int nxids)
  
  /*
   * BackendPidGetProc -- get a backend's PGPROC given its PID
!  *
   * Returns NULL if not found.  Note that it is up to the caller to be
   * sure that the question remains meaningful for long enough for the
   * answer to be used ...
--- 1502,1508 ----
  
  /*
   * BackendPidGetProc -- get a backend's PGPROC given its PID
!  *	
   * Returns NULL if not found.  Note that it is up to the caller to be
   * sure that the question remains meaningful for long enough for the
   * answer to be used ...
***************
*** 1536,1576 **** BackendPidGetProc(int pid)
  }
  
  /*
-  * BackendXidGetProc -- get a backend's PGPROC given its XID
-  *
-  * Returns NULL if not found.  Note that it is up to the caller to be
-  * sure that the question remains meaningful for long enough for the
-  * answer to be used ...
-  */
- PGPROC *
- BackendXidGetProc(TransactionId xid)
- {
- 	PGPROC	   *result = NULL;
- 	ProcArrayStruct *arrayP = procArray;
- 	int			index;
- 
- 	if (xid == InvalidTransactionId)	/* never match invalid xid */
- 		return 0;
- 
- 	LWLockAcquire(ProcArrayLock, LW_SHARED);
- 
- 	for (index = 0; index < arrayP->numProcs; index++)
- 	{
- 		PGPROC	   *proc = arrayP->procs[index];
- 
- 		if (proc->xid == xid)
- 		{
- 			result = proc;
- 			break;
- 		}
- 	}
- 
- 	LWLockRelease(ProcArrayLock);
- 
- 	return result;
- }
- 
- /*
   * BackendXidGetPid -- get a backend's pid given its XID
   *
   * Returns 0 if not found or it's a prepared transaction.  Note that
--- 1536,1541 ----
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 2695,2705 **** ProcessInterrupts(void)
  					 * idle-in-transaction session, so make it FATAL instead.
  					 */
  					case CONFLICT_MODE_ERROR:
! 						cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					case CONFLICT_MODE_ERROR_IF_NOT_IDLE:
! 						cancelMode = CONFLICT_MODE_NOT_SET;
  							break;
  
  					default:
--- 2695,2713 ----
  					 * idle-in-transaction session, so make it FATAL instead.
  					 */
  					case CONFLICT_MODE_ERROR:
! 							cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					case CONFLICT_MODE_ERROR_IF_NOT_IDLE:
! 							/*
! 							 * If we still have a snapshot then we must
! 							 * cancel, else we are free to go.
! 							 * XXXHS: As above, cancel means FATAL, for now.
! 							 */
! 							if (MyProc->xmin == 0)
! 								cancelMode = CONFLICT_MODE_NOT_SET;
! 							else
! 								cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					default:
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 1259,1265 **** XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  	/*
  	 * Data lives in different places depending upon when snapshot taken
  	 */
! 	if (snapshot->takenDuringRecovery)
  	{
  		/*
  		 * If the snapshot contains full subxact data, the fastest way to check
--- 1259,1265 ----
  	/*
  	 * Data lives in different places depending upon when snapshot taken
  	 */
! 	if (!snapshot->takenDuringRecovery)
  	{
  		/*
  		 * If the snapshot contains full subxact data, the fastest way to check
*** a/src/include/access/nbtree.h
--- b/src/include/access/nbtree.h
***************
*** 536,545 **** typedef BTScanOpaqueData *BTScanOpaque;
  #define SK_BT_DESC			(INDOPTION_DESC << SK_BT_INDOPTION_SHIFT)
  #define SK_BT_NULLS_FIRST	(INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT)
  
- /* XXX probably needs new RMgr call to do this cleanly */
- extern bool btree_is_cleanup_record(uint8 info);
- extern bool btree_needs_cleanup_lock(uint8 info);
- 
  /*
   * prototypes for functions in nbtree.c (external entry points for btree)
   */
--- 536,541 ----
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 54,60 **** extern int	GetTransactionsInCommit(TransactionId **xids_p);
  extern bool HaveTransactionsInCommit(TransactionId *xids, int nxids);
  
  extern PGPROC *BackendPidGetProc(int pid);
- extern PGPROC *BackendXidGetProc(TransactionId xid);
  extern int	BackendXidGetPid(TransactionId xid);
  extern bool IsBackendPid(int pid);
  
--- 54,59 ----
#2Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git

On Sat, Sep 26, 2009 at 5:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Just a note to say that Hot Standby patch is now on git repository
 git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby

Awesome! Thanks for taking the time to get this set up.

The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.

Excellent...

Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions.

Sounds good.

I request anybody else interested to do the same to allow
us to work together. All contributions welcome.

My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items

You'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests.

I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.

Sounds like you are making rapid progress. If you think there's
something useful I could do, let me know and I'll take a look.

...Robert

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#2)
Re: Hot Standby on git

On Sat, 2009-09-26 at 09:29 -0400, Robert Haas wrote:

I estimate that making the remaining changes noted on the Wiki and

fully

testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.

Sounds like you are making rapid progress.

Only because Heikki has put so much effort into review and because most
of the remaining issues are coding related, rather than theoretical.

We moving along and are in no way threatened by time.

If you think there's
something useful I could do, let me know and I'll take a look.

I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.

--
Simon Riggs www.2ndQuadrant.com

#4Mark Mielke
mark@mark.mielke.cc
In reply to: Simon Riggs (#3)
Re: Hot Standby on git

On 09/26/2009 10:04 AM, Simon Riggs wrote:

If you think there's
something useful I could do, let me know and I'll take a look.

I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.

You might try and steal ideas from EasyMock / PowerMock - but not sure
how well the ideas map to C.

Generally it means allowing the functions to be called from a "mock"
environment, where subroutine calls that might be called are stubbed out
to return sample data that would simulate your scenario. Object oriented
languages that require every object to provide an interface where most
object methods can be overridden are more ideal for performing this sort
of test.

I rarely ever see this sort of stuff in FOSS projects, and never that I
can remember in FOSS C projects. It's not easy, though.

I assume you are doing it through code changing right now. Commenting
out lines, replacing them with others, etc?

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

#5Dan Colish
dan@unencrypted.org
In reply to: Mark Mielke (#4)
Re: Hot Standby on git

On Sat, Sep 26, 2009 at 10:45:17AM -0400, Mark Mielke wrote:

On 09/26/2009 10:04 AM, Simon Riggs wrote:

If you think there's
something useful I could do, let me know and I'll take a look.

I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.

You might try and steal ideas from EasyMock / PowerMock - but not
sure how well the ideas map to C.

Generally it means allowing the functions to be called from a "mock"
environment, where subroutine calls that might be called are stubbed
out to return sample data that would simulate your scenario. Object
oriented languages that require every object to provide an interface
where most object methods can be overridden are more ideal for
performing this sort of test.

I rarely ever see this sort of stuff in FOSS projects, and never
that I can remember in FOSS C projects. It's not easy, though.

I assume you are doing it through code changing right now.
Commenting out lines, replacing them with others, etc?

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

There are a variety of projects dedicated to creating C unit test
frameworks. I don't have a lot of experience with them, but I have heard
good things about check and cunit. Here's a link I found with a longer
list of frameworks. http://www.opensourcetesting.org/unit_c.php

--
--Dan

#6Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#3)
Re: Hot Standby on git

I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.

There's always pgtap. Whenever we find a new corner case, we add it to
the development test suite.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#7Mark Mielke
mark@mark.mielke.cc
In reply to: Dan Colish (#5)
Re: Hot Standby on git

On 09/26/2009 02:28 PM, Dan Colish wrote:

There are a variety of projects dedicated to creating C unit test
frameworks. I don't have a lot of experience with them, but I have heard
good things about check and cunit. Here's a link I found with a longer
list of frameworks. http://www.opensourcetesting.org/unit_c.php

Looking at check and cunit - I don't see what sort of mock function
facility they would provide? One part of unit testing is arranging for
functions to be called, tested, and results reported on. This can take
you a certain amount of the way. "Pure" functions, for example, that
always generate the same output for the same input parameters, are
perfect for this situation. Perhaps test how a qsort() or bsearch()
method works under various scenarios?

Most real life code gets a little more complicated. For example, what if
we want to simulate a network failure or "out of disk space" condition?
What if we want to test out what happens when the Y2038 date is reached?
This requires either complex test case setup that is difficult to run
reproducibly, or another approach - "mock". It means doing things like
overriding the write() method, and making it return successful N times,
and then failing on the (N+1)th time with ENOSPC. It means overriding
the gettimeofday() method to return a time in the future. A major
benefit of this sort of testing is that it should not require source
changes in order to perform the test. This sort of stuff is a LOT easier
to do in OO languages. I see it done in Java a lot. I can't remember
ever having seen it done in C. I think it's just too hard compared to
the value obtained from the effort.

In your list above, it does show a few attempts - CMock sticks out as a
for example. It looks more complicated, though. It takes a .h file and
generates stubs for you to fill in? That could be difficult to manage
for a large project with thousands or many times more unit tests. OO is
easier because you can override *only* particular methods, and you can
safely call the super method that it overrides to provide the underlying
behaviour in the success cases.

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

#8David E. Wheeler
david@kineticode.com
In reply to: Josh Berkus (#6)
Re: Hot Standby on git

On Sep 26, 2009, at 12:33 PM, Josh Berkus wrote:

There's always pgtap. Whenever we find a new corner case, we add it
to
the development test suite.

Also, for C TAP, there's [libtap](http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap
). You can then use `prove` which you likely already have on your
system, to harness the test output.

Best,

David

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Mark Mielke (#7)
Re: Hot Standby on git

Mark Mielke escribi�:

Most real life code gets a little more complicated. For example,
what if we want to simulate a network failure or "out of disk space"
condition? What if we want to test out what happens when the Y2038
date is reached? This requires either complex test case setup that
is difficult to run reproducibly, or another approach - "mock". It
means doing things like overriding the write() method, and making it
return successful N times, and then failing on the (N+1)th time with
ENOSPC.

I remember a kernel simulator based on User-Mode Linux called UMLsim,
with which you could stuff like this. It's dead at this point though,
and I don't know if there's any replacement.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
1 attachment(s)
Re: Hot Standby on git

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

hs-riggs-branch-20090927.tar.gzapplication/x-gzip; name=hs-riggs-branch-20090927.tar.gzDownload
�~D�J�[{s�8���O�����C�������8�k�����WS[*��$�)RC��uw���� E*������U�*1%h����F���_�Yt/�:���/�_�"_:�3F1~��d����Ajg.o���}��u:tm������6���w�f���u��f��F�������7���>�ND��4Pa(���vO=���G38l��^�w��?n���7����f����^�����7��(�+9�f���Z(�*�y	����Spj8N���S��j������wT����"�/�F�5h��*R:�V�e������F���^�������_?��>A�y�����i,��k,����X0[����*��;p�w'C�@'Q,&�@���yy"�����������S�2�4Q�>��7E����}�9�C�^��2�
�L!�p��S=�t��M��\o 5xS�<q�*�2NT����t{��@���j�W�1����8x����������Mo�p�q8����d/@�x�h����g���OPo�{5d]z�7>��!=?
���cj���w>�|yq{Da�
�����7pBT�)
�N�Wy���
W-�4���/��p(/9C�N�W����W�+�����Rz��I"��=�������iC��Oh	{����K���@L4|��Nk�2<���ltu=�>�
�~~�<k���Z����]�>��2
CN
C[;����]�0�}���N��0���9��q���+A�����$���G������TB�LeV������,��?�<
�4�����a6��R��k���|��'A%?hP�y'�=H"K�� dr�G$���j��o��8�Kj�"�gr��q��L�����PJ?�!�E�Kn���)��{4K���eH}��D����c��G�3�����3�� Jc���'b��I��Ht(�z%Nu�q��E�����!�������JjKj1E'�
d>��V�eH�wH�F+���vX�9|��������:MT�n��X�;3��D2!�Q(�9k���U�S8��6o�~&'��en
}��e�Z�i���)\�$�C
�_NOA�5�Z��<JH*
�iH���Z�
r����4�M��o���^dJK����q ���%CR!��}�
�e��%u��qzrP�g��Z�j�;%�C��*�_�/�O���X�4HP�P�>�c�J��KH���g+*~B����J�}9��c�a������
%z4�He��x�J�,g�?OoHe��o���N>+�Y����������]����c6r�`��3w�>DA�'��d;��"+h}�=�C?�����f^�u�f���C�������[n,����^X�%�����+����0���C���T��A�� ��{r�L�%�&{f7(p�r�^���7(��vT�	��O��h�ote_��3�{�n{;�D�k$���q)�^��%�B_��S.���{y�����d����T�w���I�-�J���4�����>�:�
+�e�3cf� 3�	��������w<��pu����:'��5�]
��M/�f��Xo����D�L��xw�T���a�p{{�(<���Q?0_Vj���&���`4�_��������d��������E����2���[�@\S������f=B�d4�H�
�4�Y������G��|�c\A�!�.��Q58�x�����J:_��D�>��ZD��:��"0�j����El��2�-
�kM���rH&���7�9a�{	���>Og.��hC��Z?���`�)6wiYj�i�L���t��H�OQr����~$��	��F��c#���]#�E��t�KpI�AKtv4Yk��
�'�o����
tlj2����;2B3����a�K��(K�t��X�����-:^AA5kTtA8?���0��#+X)��a�����8���o�j,;m��2t�JI�.��N��0F���v>@e�y��U`��y�q�#�j����8�
�T9�r�����i*�2�,������5w0h������]��z��d4���Q�C%�eh��\��M|����M��������������g��(
����t��5�D*�,K(���2z��m�����:����xm'z5��Te���s�N�Z���z���}�F�~dS��%R��'����>��?��-��H������k��f���W������o��n6��.Bt��\W�;�F�1�:����^�y�����/R�A����6�/[�/[�m������`����Ie�y_���^OfG������k����Z��D�p��l$L���`8u0Q!����s�i�����d�yG��>t�X�����������v��w@��&�?����Z7W�����M�9[M/[������v���
�tn����th_e��b���p3��M��0p���;��|{�X~���[YHt�<l�������J�W�D[;,��1��V���}{�[���lI��*q��\��-��0G��b�AtN9	Q���$�7��gl3�TM��M2c���#WSs,B��
!���������!���3��	*���`K(�� �F�]�O)�������p=|wz���Z#���XPRH�����o���M+)O��pu?��g�|���EJa����S�id�6��R'�r�F����8Y��L�F�������|���`F�X)��Am�Ji�O+�0D�*fs��l��q���e�i,��?S!z�X���%}��L`��|�
�k�����FV�1��P����c�$��(������/H���k�3c�����4��@���i%}��6�/2��*������n�n��7�t�  ���4R(T���������h���X=��TFbdM%��4��,���I�����Af��Rr.s:S�������Wm��N�h&E��%�!��bCNQ�	23YH��~+�b:�3�D+�dA��f��?�ado�p$(d�OG4t��\��s��e����� g�����Fa���(��E�������(!�G�2S�)}J���[����}�:�f��f{��:���T21fZ�y!�(/�"L�,���<]%�f� ��'��5Eyl�If�:|�f�B�B������8��QC�z���Y��	���m��m,���w�{:b�E"��=>8+Q����e�mZ�K�0s:�;�sm��lZ�T ���j�zSK��3�e�A���"����D��b���St��12��F�VL[:*$�������YWa�x�q���������X$)�'�	�mh�65�i��I;b;�C;�cB��������T�{E&:�V�����7F�[���`it��^�|L��l�Dx1m~���f9I:�����wyb�(�(����X�lFFJ���_�9�M>�^1���S�Af��E5�>�}�Le�KC��s^(�3'&�<5`�br�������|�!%����I�w�Ia���=imFh?|+���>`��/o��}MR���V��� ����xb6���Y�h:8����}�X�e�6�s��c�����m	�������\�>0���j|+Z�5�N���X�N5e����m�r"\\�	}����`��Z��6��
(��hV\a|����}�Ix3��r��"|d����
����S���(���P2oxT;���x5@���r0Q�����WDh��.1�l�Y����N/>�v�.�����]��G.�{k���0(�L���)�N�ih�#h{��V�j8�SGf_�c�����5&���X��L���UTH��9�z��n)��_����kX��2J�
�����z�0��0��-�j����Y�D��b�w�������uKG�l��D��:w]���L�(� M�0��r���Y@���l�����G����T�!�i��{�h�v�~]�t|i�|J�N[ct3 ��D��s$�������#.>���*�'D�i7w�#�i�Z��G������tI�;oH��B�G��7�����������_
���.>|xU����`���~��f@��k5[��|��Ol��/-�E\�"{`Jc�d���;�=1I3R�f�aN�7�����e�,:V��*G����q�'�<a��9��i�� �f����o���\I�I��s"������'�s�XoR�����:���`��&Y����[���x�^���W�l$�u)��f��=�{���#$�N�t9�a
�o@{QH��0P�8������}F|�{����������-H�B����sS��r�#��-�a���w�,������|#�?���?@����k8�1k�9���r�\�w
�4��O+A9��E4�FV��;n�=�H�s(�P�2�GT�����a����20���o��3�W�� ��(�5��u�_]�#���,8r�bC�]������������b*5�J�u~k�9������?�e"[U�����l�B���$5�-g�L�K�z:��!}�P����(�s��`x���[4%
��7���]3��:/"�T%�>!:pY^��)Wg��SS�
�)c��DUy���1c��r��n���������]�J5W3��U�cOB�\�Mh��xp"4/S���L��8Y�%�
NQ+�(�e���Y-�]?���)�_ihV��V{��1aa��G�##A/?��V=)Ao.xc3h�����E�'�~����Ws������������;R.�Y��2���o�&9�;7��^iE����d�:)�@3BF�E%d6W
A���`��Qk@"���[��
�P���	S�fXK$f���Y)����,����Q�j]*�<����lZ��iW���!��Jb�;�S|�����H�R� ��`�|��^�\M?���V�J9���q��~���<���<�
W-I���f��tao8rJ�>�;�6�E������^�H��������@�q�H����)uM��`����c�fn2�D�O?�M*�[��En"�������f�:r������c}B�1'�����	��Y��Tq�C'�1���-Y(�:��j����p��x���&��qeY4�:%;S����Y�t�����W�JL1����:�0;�3����i>{^;<�
%p6�u�N���{�k��o�����W-M>�dX���o�p^��n�������37�Gp��w:R��g�������I������������<XbH�'���S�:C���1��e�Q�*`�_3�����X����L��:p�6q7����N>Hz�:����	~���A������
���D02��=�^�7��)��3���<2����X�c�����a"���	+���Sr(�071�$������M�"$C�96���=5f��]���X�8�����;l���!�?�
O�G..>�rV��T
.����/.�����,����u�!�c���e!
]z�c�^s1=?P1��u�;���]��\�������������c��)�V�=�32Xf�����$i������=��8�t<�����|��9'�
{�|���,	!(t�,W�".kR6���R�-��j���}9b2�J:��d����-�W[�r|��]���s#�
��0��{����,}�o�S ���\L��'��d[���$'�����(�M�l���������U����bwf�<��LTj�Uu��r[#tA�Ss��]/lzb�f�&:]k��>{|�	���4���wB���}'����~�DsS:0�!�M���>�N��}��p�����e��y����Z�=lvG�u�Vr/�D9
���`�/b�h��qDS4!�]*	�B	���v�f��\,���������A��0n�
/4�7�w����t0��+�_�u0Bj<�Ww����.l?�@���h�����!��a�K
UE
�P������'��x��}����b��)[�T��6^H�����J��k����A<Zr}��Z��h��C$(�NI6e56����.�n�����>�
@���0�cW��8�K�.J�7���U���EPi�6���@��8};������H��
�N��Q��f���M�����I��`���7��7;�Q~��f��e���c�J�Z����G��x���Q�������p�~�����[���3|���� 
�����$M��r������uO��a��N�6'#�7�~:+Y�X�F�b���n���`d��8�K�d����g=3�����;�����`o�|����������1�b���<�~?w���������7g�f#�����o�4kA��L�gG���H�����K@1��3c����a�|w��b�u��2j��t%<�UH:�X��������g���g�
�H���������
TW��7���<~���v��jv����Z
'C��S=?�O
��c�|�'2��E����[���;7�S���?��Y�q=qI�D���T�f�)yu�����#?^U���s+��Q3�fU4��U������V����A����~��ce<��x����b���8�B	�8�P�cS4"���]����i�Qq+R8�g��3��p����� �F��������b�o����6v�����ml���s���d����m%����M>}V�
���e���p��a��)�Y2D�����c�q���M�TZ(f$����:�e���NK���B�0�~��Z��q����Y�F#>��j�B�|�� 3_�GP4	��"�[4-����������������q1c%��Z��e4`�+�����W��n�����-��3����������w������k�kb�S
�O�T��:H�v'��Sg�p5"�]�I����+�$��\��7��[�z2S������K����M�k�������t2�����x��>X!�c����9?��_���H�����
��x��K�k������3=�}����f�T��FMdsH������
��bZ�(fo�B�aS6�)��FP���W.L��X�	+�7T^��l��R3��L����h�CT�C� G�	����f@9Dse�s"p��hk�n��)H��U�-���E���7��6��������o��_�C�����v��gw]���N{h�
��u��^��F���9z_ ��bT,�1��NgM�o���Z��(6�Q����kQ����mbj��*�2�j7f\�D�F2x��$�G�
��1s���X���=��A���m�#���~�7��}�c��S��
����,2
\3�YX|�B�n/=�CH ��'u��\%��
�Iyn0�r��%m��1�8*Q��.�7A$��~X�X��M��b2fI��	���� V��D��.��%b�P��=��'�aR�����$������H[Y�H�Dy����
�9\;��D���z�/G�+}��a&�TL]��o�O��M����T��[�(M����m&�2
����RZB�Ev-���E4�m�Go�a�K/�Qal���������`�U���
,������>��L���t�&�o�BI1��[26f�r2���1��.�T�
��7}����
�JP�x
���*���G�G�2	�����,�����aD;}�!�vH����b��8��4�k�m��q��p���
���>A5vPI�7{�0��I-i����G?��yL�>�b��q'AK1_���_������
T{���>G��dVX�C�8%%�k��
�c���

�\p��"ByQF�"�O��Er"S{�$�`��\��j:�S���q�yi��F�!���&�|K���}
���y\��e���4z~o�]������qy�8�����q��!V���Q.����f����j~�u���~L�V6b�)DP�� }g�������@4���Zx��
�
W&�7�F2��������z2y�� �X	�o��C���h�D[/�@���Ux��F��#�V;H�3c���M�Wvc�������p�[���}����}B�>a�4��5����
�����';��K��^0B/�X]�hvy�G���r��i�X���ZL��
�d�$U����|�U���`���c:�t0��5?��%�d�������_H���it�`��������
���V��i����/���3EL/��*�(?'XMSc������kdQj��h]���I)]M��B��v� K�;uVT�m���h�w"�D�
��(�"D��D)�����OD�/�����a8��4��G���N��CR���F�
��Am�5#�>y5���������_�u.����.N��u��~?-�����&G�_���w�{�������^��3gk��O���z��c^��Of�n��j�� /oA�et`A���1f��|�7n�"�����D12�>u!Q_�����;��iz���g4�t���	"sy����>�)��5���&h��h05�aRJ}���)��S�?H<&���7����~	���:E��	�i[��K�7���l^e6U����e$�D���
��b�v��X�r$���������W$&4\�\-V2Z�vq�������b���V�I��
E�_+�txE8����6dB+��Y�|<�j�sN���
m�w�KK������O<�N�d�&� 9w�����"V�)�V4a�2������0@��<�Ed���+�&�f���;��j���j��DQ��/��D��>�K,�b�<�=��u����g�p��D�gw�z�_UB��B��O��5a����!��(W.,� ��.~�J��e~���V���R(&��5qW�|x)q%��A�@��&��Fr��� a>9���O"x�2�[�@?�9t���J(Y#�r�T�K�3�!��K�n��,�x@^�"������J�\��TH����N�U0FH���5E�)%�p&DyG�)j�NFH�}=���:n@EV�%
QZ`������w�c�A�b�1�Ug��\�Pe�*�k|�������2�u7�����s�0��$� d?*��
��Y�U���u?��(m�E���H��f����f�9��0�2h�_���3A���df�&��zU�,,�����&.����X�8L��l���-��Q����?��������#E�|��2���0�5��#�u����9<���ur|iuz��9G)�����O���+s��G�H���8Qf�n��.�W`����";�&aV ����'���k��f4��w:�^�f����c�/���2`���X�����m��*w����+�8��������w"��n�l�����+�s0=��/&�,Hz�O}���OCL��j(`%�_Z��&�����:����H�,�{�O���9	g��LJ��C[�������`+�(J��%k!eg�����luWn�yZ�M#w���eS���.���\�f��������7P�
?
O���qY�<e�/8�;,6z��r4;��( 8�\Zj��p������jR����n�]�/��;��F�c��o�`�z��*i�$��E=����E���������H5�&��[��$����Y/���8I�3� 1��Q�����-Y�H�'��i:IB%�QI[��H3�����z�S�U,�jR=B�tw��`2���c��*��:�R�I>�5WWE�Z$�
�������nLP.�r"���\�$��3���[��c�	�Fe���9	j�=5���2�| O!���N���"C�7�v���w�'��j&��@q�p�V0�B?��7����S�cL����1'�Q���H��4�8��������B�Z-D�+��D�d�!lV��\����ZMI+&z<�c�RheH�4
���9#i��
'3��"�B����3#�-+��v���7[u�Qm6[=��Kw�0q������Ro������*o� ����0�����!g�*Ou����������3��0k���cI~���y��4�8m�^s�?`��9u�{������������(���*����R(U,�'N	���!w����D<Td�Dgt��%`���{�`9r���5>X������I����$���F����t���'E�
����V��fl�W\\��T��Jh�"�J�V�F@�)J��A��
��!��PL�N�-s�&�q�����c=I
M��,�8�F�`��E�����I�a��)��0��ViqVh����*�?B 1J�������f�W����HQ1	!z�Y$�������Sq��$��f��_scF�N:�^���\���.>v�uS��
c��������G���u�C�M(�$���a����U[�(�����o�i���P����gp*�lM&���)�j�v��:0�
�w�5C��6C�z�rT�S<�&\�p���B
U
������[��I�'�G��,}�L��-z���<���k��	e���V���\�MR����47��\�
'Rs]S���>�������,L��ez����A9���&#$`������6�>%�x02W�Y�^�g����Ydw
Jd��{�qE�;d=�t���(Ng=,Yb=���
����a4\i�y��xt�6]��c���r� �9/0T&"�������K�'gG ��������R������iV�f����$�N��Y�d��*�	F[���*Z���!R���!&t}�g��i����N��-��k|(��;j������z�N��4�-�6�]��=�g;�N�U�?�/�-;/��I��{d�d��"��"��kVQl$�DI����s:Y��e��}��K%�luC^S#%j�O&�������O��
�l�1�3�����&b������F��i8��n����4Q��?�����@CH%�����`B(jWS�pPVU�Ge>�~�<\��3�1|�%�f2�4H�x���7���r�Gi�c#�@�xx}��{4��SWc�S�b�E������s.W�y�]��K��J��5u������/(�A���o���Rn�u�o�O������7�����K;���`�$����^�d
�����N�n��Z ��M���S
	S���X�9PQ����BG����u��{����.�go�/������n=}*>h�o���~�>��X�.j3������0a���4��������1�|S>E@}L��E����B��r�1��@�(V��%s��|Q��a����\��6�NbT��#������$~Y�l����Y�p�s+����.���
bbn����n���}�������-;OZ���p��M�C��!�]h{��[�	�+e5G�0��������+�J�EL���&�T�����t
�&
!��US��TFv��
��G���������"���O����HI�v��u3������*�������	�9�9�(�4�.6�;��KU��}IYdx�
���S�nv��%��`��������[�&��i`�|K�|��b>odr@�#��x�(R���>+bJ���J=�?_�M>K�E�X�W`YE������$�!��OJ��v�;��W*�]_H��CKi.��.A���w���y�������_�?!�D8�(�a��k������@e��EGP�����M������]5U��#��5$P�t�$k�0w�5����5|,��4���07���N���&�s�IT�����������=9>?8��?�PD��t:�IM�w�&I�Uz�!*7����j�����z�\��spU�jT������V�����<����v����&�����c�Z������7lv��s��N���������A��|	��{��!��/���s����B�1;'#U�Z&��t��cv3c�E>yQ��Ax
�S	w5�����L��71��/��d����$f��*D���!�=9::��l����gh����i'��tv���[c��~��<{�d��!P����M���PS�0Y�A�F��Vl�T[�7X�����6����zm�`�n�zQ����Z)$/�a>�1q��7��Z��Z
�"^M��W��0D�����(�9^��x����e'_��;����=H��W������Fs�N��,lf%MUY��
��t����z��XI�� ]+9*#SN�����j�J���R+�C^������h�SJ�zJK��w����k�wQ9Q��^o4j�j#�7���1Z��24�r�
�����H����F
C#;T�C�_��:�O,���='�2Yn���D3q���3-Fd=4�����s��=0.�#���gY��A��r����=Y���\a����Bw��P�4���L��������).��`�����E��������������*&|)�`��z���\V�����+5��D�6%��,�jz<�'��q����kV�F���Nv1D�(@�f�[\�/�+�����w������`�xop��o��������^���\�����G|�����q����c(s�qe~�C�������,�J��%������2����CeQ�<���?�{�a��z�q���D'V��5e��
3��
�%�e8W��
�'����;�e�2SID$�jMu�z�j^]v�RU��������^�8v0�r��"��xI7
�������@_�B�r����`���?�?P3���4\8'�
6�['���m���)rB@c�[����5?������%EKS����|�L�$B�Nc� A�U�$��H(����,�h��h�N��5Z t�9��������F��T���������������"�**���d�tzs�`t��.��(szn3���a�i�-�n���f��+�/����">�D������P�h�C�S@&�`�:�"�X��	)*.E��}��n\�'�wBb�H�z�P��-�zk��ho*�������%�X�)�N����V]{>%gA��V�N~�v���n��m�:�~�|��{`�lQ���2�*�r4���h�eV�%����De���#D���!]�~�#B0��H�ED����9�R��i�%3��*f&��K<JR���e�sQj���+��K����L��p5�/	��\����Q�O�Y��}V����������~���|b0_X��]�Kn&���Jg	��0��k�����
Q�5a����1�^�E���#HLY`F���]�$��%7������T'i�bE������t������(*����	Gcl��+�������U���^��d�&@NS$����|�(������.�D��%L���u��X��7�T��F���bI;��U�\\4����D~���{AQ<��d���L����>.��r���B�>���n8[X@@��E	��>���<�Loa���
;k��=pQ�i�^�@������\iE(o���/���/��v����K��3��a���sI�����2�o:_��4�f=�I���HJd���H�\`oZ EW#��p}��"Gxq(0I�+��]$�^I�,
�����.�A��,|�����x8*f,���>y���rt���`�z�{h^o]�-��u�7_�U�e
.��y1�
;��AS�er
Z��n��m�sxP�$��F����Lg�;
���;��t����*4���L�x�f��5j4�����_���1��0C�q��g)`8#%}:ex�Z6�4��;��W
�}����qc����F��g7��<��O��������y�����$^2�Q�����~�1l����p��Z�?��_�c�5��#�0���N��T8�����O�b>���+g�;5���l�M��.�Zj�;��������M�D���$NGI���[]�7��1$[�����#���9��(�JM�1�f1����.��f�������#�@�x��/�Z��
>B���MY���gIa1�u:~��k�j^�k�����'�����*0�V�qy�0��[[�)
=���7m��9�U�y��L�O��d�Cq'�eq���7���P�{����W�%������,(@�	z�"��������8��B���2�Wa�0��[6��n�O������r3[|4a����"���!���@-`-��6����j�3��I�
���d��h�3&��/�>>;i|u��R��D
)7lS�@f�>��Dc����=}fI+{rx>�jd�Ce4z����h�_b����c�I�T��3��2�H��Z���Vk�~�*�6�>}o��)�Y
��g_�:8�)e�1�U*/cE)'q���K���_yc6[���hU���QC
���Y�=P
�x�����%��^�����<5��5��D&V�S��@54^����=����������Qc�������� ��p ����J��e7+�#��WC�f��]
M|W�dj�rv��1UB.�����(r������*!�U0���S!�����kM5q>qrUX��M������J����	��q�������)��������
T�b���D[.��]g����)�i��S�����+6�K7Zp�%�D^A@b�r��}����D���5���A��60:ia��=P1_x��e�U�4����vF�0������)v0����D�;��9#MX���a�'��}+�^:0gL5�1�>��$z�-��"���eS
<p��#��.c��0���d���m.&���l��2+Z_rH����|9�����8X]r8hR�e;�I�����7���y��9�r
�\���[�_Bm��3��5l6�oC���fk��0��|���g�H3��e�����5�u���r��v� �y��]}��br��0����u�����^<I�[L���<&M�5�~H��_�e�(r^d�u�����?(Su���{��t��Vu�	��cr�.������64�'(�<D������=�&qXo�s��Fn����O�:l >���Km3'&�����G��^��A���m&E��7��t�������@4��5��`S��X(H�K#�O�A�����|y���x�Rv���I�;����"�-�����F0���-�F�#�v���C6�U���U�#��������
F���O0Nf�����f���K��\6�GMh�f�]Ch@
���]N��nP�u2qr�_��_32;�������������B�m��=�Vk4�A;Xc��h"M���*)�
��?���(.M�9�TaI8�����]U���[����2�J<�����|;y��Wx���3���@�S�fR�;s�N���)+&�R9�g�`@{�o�����'�;��xw���0N'.������zN�<F���*�;|�Bs�<*!��5����E;m����m�Y]�g�nvI�������C)�N�7����Un�)����p��p=
�\��x�q&�'X�=V�����Q�$��`"�-*�-(�"ucUx3��-A{�qN	I�"���Ej��_��~�YS2�������
yK#
��.���v��US��ZM?� ����=�U�F�FVx�pS	!;��<n�a����V>ad��&��R�m�k#]���@=l���s�k=�ZK�2�5�nZ�T��t-?��P$8�WH�*��T9��.	�v����.U6�w���Y(E��R��������U�:���V�E�52
���������n$����=+��	ir�~�G�c0�RQII���u^J�8�15�C�3��6XJ��U��9�a�lW�%"�[ETYP�d�G1�#v�M`��H�a�.���S���R��E�%P����?Bk����[(p2I*��2
���X��Z%�G�d<-�E���d�[0�P���:�D�O�2��U�j����M���{*�7��:��]��������������,A�5�(��)x9.���m������>�z��T���s�B����!q��Oa����1������n*!�����2�v���\��H/EV)6�)���fm�^U.^���CD�B
�2��vC���pOv3e����OP:���]��LB����v���7'��������v]#��VG�����7'g�����&R�l��s�Vp���)�(��4X 3���u25����|����F2�����O��-�����Aw(��ZR.CVs�)���0?L�L1��sW��y~&�_F�/i��(Q��Q�4�y \[������_��r�`}e1C:R��s��M���/��s�r�4b���cf�_7��b^L��T4�|�i�Za&�0es�������~d����C4����Z���2�EDIv0S�I"�Yp�R�"�]4�<���3]��3a�h*V]��O���X�d7O��b�Zk��7�h9UA�_�U��b�|�7��z)��������)��kt��c��K�_�@da�"�H�VK�#k���:�Jc��=������������i�V1�x�V�2Y�)�Z.�T�4�Pr��f&}��Y���>�����fe�x@��UK<C���x�e'���Vg'��<Os���M��6�����DI�\�P�A��L����\p lyW,�(�����fV���I��L'�?9;��n-6Y++�-�tf��>*�}[��V
���]���2j�'w5Y�0g)���Tc��#�c��9E��`����S1?�������`�>������l_��]n�U%����{"
,������<��k�&��Ks���3^�]���<�`��|�e����r��-^����o��a��j:�3�F��@��\7w�_���@�%'�u5�����M<��/�S�-f��+�e3��xLB�;N��j"y[]�~w;�(���,�������1b;���bV��N|*(@eu��BV�>r��)���WDJ��q�.��)P<�89*�h~
,�R���
�?W��x�Z�=9���V�����e8���m��`4E�����g�����i��M�zj����$�P�@2t��*+W�gz�e$�g����d��������c
tt~x���X!�����@���� p��h�;����7�g�D���L@a�F���w��&,S�����d���LF�h��e��
�)XL�s�����U�8�F�)�z��N�����4nm�7mq��]#�I)�x�����3XKf"��A�p��'|o
��mj�.�����w�H��m6w�r��J!�'hu��Q/F��
�l$"U���^\v���M��hZV�
�=��m�4�*����:��*6C�)���'o��J4�K��=��_P��NyCNv	��F�H8��Y� ���}�h��qZw���t������l,�L2�g�z~���p�y���;������@��e�!���vf`��!���7�H$#lX��@2���X�5W��x����_;�"������E��>~yp
P#Va
�,��Fr��=;q/����-�c���G�O����LW���D��=�h4�N���j��s;�����i�0��Y�X
8����z��0v�%a�r-+�%y=�}C�����(��N��b��r��S��������������>a���:is������
����9��>�/��K�����x�~�����J����:�e��/�������n<�����5��x�V
Z��gb
9=9�x{�nAI��@N��T��1{HEurI6�<\�P�����j5���������i�M ��M������N1�
��B�z����V��7��	���]%�zt�
\���{�P�*��������>�q�}:a�?CC����#
o-�.����>{�����Yc�-C_���B�;�Z�k8�pP���$��d�q~�� ��lJ�3M^�27e�j����a�6�A�:���SyS�b��,����	�� @}
�]�e���y�X��	��KF�{���1)���4�M\��!���;�������9�����o����t9��������'%!�qh!���G
�L��[N5p����#�\����$�o�|=��C����8,QO�.�C���) �z'p�]`X��F����x�vz�����!)�9]�y=��"��c�u���I*KZ��4t���YwO�r��S/��fd��v�N���>�\�[�{|�O������|~����6oK(��-]��5�
��
5�-.�nu�����w�D�{,pU��v�����e@'�Y�p��������o)��w~0S���sU������J��UW\b�xBW�XV�@�B�\=^��z�So'�?5��o������?���6��g{m�����zs��4�^�n;A�k�C�+�ju_������M���bUzf��3(�b�����<N����i����'��13�Y�����F~^Ibd[F.�����|-�m����6�;:QJ�n����f�����mwM���v����%�ii�(��8�TO��8�[:-�FQ���
�j~��V�Ac���p�� ���l�C������������/��1a�����R�����Ws�M����?���i�G�h��<�p��
wb�K�*�/� ��M�N9���C��!���:WC��&f�,iu��`������P�����k���"�����=Mx��g���x/���@�rRQ�R��l������!��i�I�g������L3����3�3}�v��j��.�:P-g�Dp�<����J�kQ�H/�KA�#Rk��M@^!q��G���<h6�������].�!��c8�x����Vj��<��H����f��r&�z�s��{�����z�.\8c�1��wqiW��/ND�r�d,
���}���O���OE�93�{d>G�W�z����U����Y]��*R�t5�R��lZ�P����\Au���_���$�?�����}��^�q����]�������a-�Ngdw{����__��k�h�^4;y�_���c���%�)����z��WL�<Ys�o�*��?�v�7\P �Y��{������c`v]�-�����qnogK�����N���
�ynJ��_�_�l�S�����j�	��TLI�e�L��2/�"yl$�R��T��V��s���ld��'K)��a���/h6g�1V>�Nn�|�$c�`��2~��������f��S �B	���������U`��+��2�V�B��<�����5�um��Z��b�ap�+D�X������ts�a���(���U'��U�<�}��j���F���h����Q�XP�bk��8y�����
2C�;��������}#j[���q��,�PI:Nf�����B�.�<N���;
	��x�3r�|&A��97D�
(������1���J�8
���N�m�-p��&��R�`8���.8y�R������9L�������PB�g����}�������9�3P�@������>���F���>
�"�����
����}����L6E�?1��<����=1�����fI��_�h�lM�&��rG|�����|�v�]c9*�D8�k���"/���q�r�K-;�g��X3���W����J/�)�Q@��)9���,N�0_-��0#��-��7��8�W���l�s���#��@!}[o��XZ�4m�Vs�f�g�	��ne#�]*
rno���f������p�Xe?��$P�cN4Y�������_H^I��QM�/o0�g�6����8��S���Ng�Qh�F�!�
�&Ol�����|7FZ�U�Q|�}/[�{�<�{1q��1�<��q.U�Sl�w��8��SL2?����J���,��k��Dh��C
������(M}�Y�E[��\��ap��V�u�*���V��Dj�f���f�
D��}u���������I�N�d���j�.�Y0���?�=K����k-
��!�S�:o�k5�=�f��^3��np�)g;��3��m4�-�����N�i�];�
�����5�u9=���y�nY��������4X-&U0�>��1noq�7~ye�^�X�?^F�B����g,^�</���W
_D9��M��p�aN_!�V�/�Bz������������?EpV���o����z�������%�;����>�������uQ���:�>����v����n7��������������f7�u�����fP�e6��f��������i���^��<�O��?{��k��t�1��=L�x2�45�-w|�v�&�p�����8i����A�l���N��R�(m��D�������A!��X��t�AEd+j�;KQ�H)j�z��5�4�����~����Z>���x7�z�
��&���7� ���!G�L�lEx���p,�X��`�
�97I���s��K��ur�T~��"��&��t���9)@=*�V��G������	�����p�q������]�X�7$�F����������70.�8�Y������Py����~M�J��u�	a��F�+��?�y��s8��l�_�)������!NV
������K������r���4%[�/�����Y��&���(A���	�M
o�"(Dp���8�O*�CC�R�d�5q'rO���2���FH�}��8��RZX�w�fu��Q-e`���W��~�����QT'g��4����n��zCyYt��,�(�Qa��$��C��Sn�&`�T����1:\)��X����������������>MaO�B_�
�T��1%
��A����H��C�	p@�(��B)���`� D�����b�$��V����n�c�:�/3��i�q���K1����:�9����/�9[�����(�0w����BIoc������O��l�"yf������������NS��:��6W������Bl�KQ�W,����{�kA��U�����xt[U @�����Q6�4�U��<��g��L��z��&�}��]�����^�u�f�����k
=�o��^�>�tz�W��:/Z�K�-��H�Sk��X6�!���UDx_�v]����s�h���-&/@(f�b�.��dE����k������p^R2F�=�A��_��M,��#:@}.�@�c�(rM$�".��Q�v.f�8{
���5�u�����I��L� �K�k�m��0������(����!�fAY����d�!dI�)�X�����l+d>ZS5k������x�Bk�;����$t�"Vr-X���T�k��zU��������})FN��w#� ��i-�+�jKRMU����[�!H�2��
�\ ����g��f���'F�)#	#�'��l�#�����{����m���D���a�CLB��'a�Aj��f
�=�(auFs2����X�GFO.�8�1h@��Of)����@�����QL1�)2p��'���CX���4m��^����ss������J����h,/f�������%!*3�~E	m7��1�ra����D5X�������')u7[�c�b��	b�D�m�QA�.`�@�m��
c>��.�,�\-u[�5N������B��3u\��4{p��{��Q���a��O���z3����
;��c2�:�����G���he9��������.�w�5�acZ���p�'+y����1e�!��"��}���/-,���|�Hj
�<�����X�&��"\'D��;��Wr.4I�>�o�3�����Y�����J(����3��L����N!N]�qF����J�J	�{S��=��!��T�N�a��jt+N��T[o
��t/�e!*���i���\XOTY�����Q�nl([����N�p
���3��2o�>&l.E����-��Xbf��?u?�/�V���i���(�}9�w����&��|�|������9������E7>
�)���/��i��|�����\��	���C|�O�L�]��������3�gfvO���`��PQ���� ����i%qM�.9G<�ii$��%x��
! kWJt�?�{������5�n���{rttp����!���,<�Qw��dbP,�2�
�����:�	��
���|�=H���p�<����1�*[#���Q���	���(f�V*��nU���8�����X���oN��A�[�}T���%6�UX��g�����\���%���������Z�o��_�C������_���n���Z�V��pG����N+�u�]���W���4c��f������cr�bI���aD�VD��S*n����9�����"wp����T\�&���S41N�;ImW*�����+�{9���2��5���k�jv7h9^D��V�dj���\mm
�.�?go8	������9D�W��\���,�e])W��r:C��?���>3�Bw$ez?��������^�x��6����1P��}�U
��TV��������
�������Xx�qc���������"�V����b�*�1���=9�����������[Ih�c����T)
�A��"f��E���cFDN���
#Z@����4�Y���O�aM�����jC����������)����a�|5�h�k�v�n��/��nvX��$?���*��-x!�U �����O��x�J>� S�`�"�@1X,�����G���5�BL���Y�(��ZT�`9b�����k��98���1�u��%�7nv[<����O-~�)�A����\��s$�zIr��'J�������L�<������'^L�����_W�6��D����f��4���`QE��*�p����&��K<p�!<$�?|4���H���m�����|����o�=������p4������lx#������{d���F/��������Z�b�Q+j->�� ���$����N���P��<�R����s<���5��J:M�}��AD�9V?�K{]�]��^7nf�������������M_S��z�n`�v
+e���5��.�0��2�7��F�Y���Q��K��Vf�*�1�u�G_������������|�l]�X'o���K>?�����'��'�VQ��PJ�����Fq��9#��u8Zd&�;H�W 
�cr>%�O���H?�!�m��XA%�`l��5���2��$�u�`l��h���f����������Hch�Z��;��&L�q0�l[�F�nKj��[e,E2C���U��p5���n��+�d#�w�o���*F�y���b(��`'�ao��X�����aF���}���@?
n�PM��((
k�"0�B���T�0�Nc�1��,��}6���T_���j$��f*6�����H�@�"�lNN���S^�Td�Q.��\��=,z�zy_��.px�b����o���D���*��p���L7�S��p����H��2	����C�I,+���3R�\����
8�P22�3,$�rP����Y[u���t��������B~�y)�q�.>Z����W3���8���ii1|hL���z��FnV�F�.g3k�.89&K��?tF�KX4c�A[�t�Ok����c�^Ng!����$_WD�?������

�K�����#���@+:�[� 8$���[WV1:�k���p
�*�#/��0�����@�h���N���xD���l�P1@��1�\^�Z[Ed�)DA8�x~�
40�K}z��?S�����#C��,�D��FA=��8�)";�sbf�Ro��IU�%A7y:0���2���Z`�28�?��##.*]�1@��@O?���p(�IH���|�#�UA��`��F�K������
�����N���u�������5�BF;�	��U(��VV�~@�A-�GLp6�:�Ab�0,�A��*8R����ICqt%}�b-p�*j]g��F%�;]Nn������)]�%�3�7��X|��-g�����OY������x,�I������������C��#Sz1����P�kWu2Yt�dL���� '�����0���lh��7b�;C�c�}���|�JR���X��d@ ��(�9�� j� U�/42���ID�{?��{H2�
�������M����l�^u�N�G��������^5SO�"���3rJ3?�r"��^���W�r����_�&�m)�AKS	��-���3����N}J���������%1��
����ci<�F7KU�."[�1J(��k{�:���S����)��|I�,%�*�;�_����E����(X^�hY���U�P�,�r��/u���Rd�l��C�������U2_�?ap�����-��������fB�Dn����Q	���R:GY0��	3."���PRd�������%O�-*�k���#�I,;�`~\�BL������9���a�-��H#��_��I���gK�8��b�$B������1��(�*�62F��|+n:�*>����bD�c`�H���������'��-f�6�"V�N$\d�6e���q<��3�Ob����`�Pz�H��f�-%+<��!GB����O�7�d��|�<�Z�aS048��Y��X���A$d!�>���*p��j1E[A�:����!H�������*��p�Z�;������*�i������n�9�M��R��fv;]���_x|gp��U^ �6����=I��d�oN�N�l��P��0�Y���&�x�S��~.,��������s���jd86L��{v��uvq����v�qDy����e����A'w������p�����H�|��\��h���Z��`!hE��n5�Eu:��=wL�C�c�H�`'�
\�ru�!��
�^Y��9U������wa�����
�S�&$�h�_�r�B���
�|�MAG����3�<���P�����m*|#�5��25X�G���ld���RA�|oY�����������Ma�i,p��B���v��d� C�`�q�$HD%%�����5rM�u��gv����>�p���W�����(��y����������5b�9j��#�,���x,Z�r����o�����4�# �Hc���B|��|����dyv������F���W,n���kq�����<��*���q`ek?,�0bBK�"#S��9c�(I�!�&z��k��Ly��f2%�Mg#
��|�\,Y>��g���%�ZH%Enp���$��uA��H�4EW5��J��{[h���#�G��	�
��%��+�z����4Xb��������T�:��1�6����|2&�@��I��f%�Y��%����W�}Q���Q+?�B�|o}Z�����#h����L+?�.��pUZ��4iZ��3
�?�S��
�6���Y���I��J�(��S�8����N�r�i���L�a��xQ&����aO~0.-f�6�m��]�r���oY��_8���`�������������s��C�R����Z�-��k�������
LcU

�[����n�r*���Y'.=?f�'��Z[���Fb�d�a�^A��?,VvQ�[��t���J�n��f����0��c~�-:�_y�^�TJ7i+��?�?����_ev"��uh��
Ylu��&���Y��
�g����?��$����o�e�
_XE[�i�Q��v��c���O�mr���;��*X���:�x]%��U����4"t���x��~N�"/��p:�L
�b�]z�kZb������(�������b��c���-��V
+�Lg�'����N��OS��j8~���9A��B<Q=?����/�������
�Ld7FhC��_� J����I���6M@�D��"*�)o���?B�C9�~B(m#.�G�ge����Y?v�R��_�;�;�c�q8����L��s~��`#M��h6��C�I�����z��t<�;��&�z>MJ�~�����V`K�������P`<�7d A��]�XgD����'�a0�,�9�}����Hr1QFX8�����o�X����_�a��^W�W���k�<�k-���J��/`�����j��1>�t�JL���<�Fsp�2�#i��W����#%�m����Z��5����������Jz,���v���������7�L���*��u�m�$ �,����[����P��'
������k-v��L$@c�������oy��t�z�v$����$�R�q.�p�;���X����n%������_�C�?A�i�:�tz�z���z{�v<��7���9��������x�h�h���#�C�C�1�=�(vM6w���V��X���vR-����/q����v������e���X�2~�E�������w'��+�����#�ZWtJ���r�
����?�=~-#��xg�4��J������#�3_ |"K�M��Y�:�y��#���)`K�t#C�T��Xj$��i$��?\��J��l}�i���\�R���d��g�������g�����M���N��w5�����}z���Dt����F��jX����)��E�jp��G�uL���D��|QYO(��`��O~�\��i����!����5�Se��Fe����������������b����8���������y1=F�7{�G�v:(�����{��I��#I;�;��|'�W���o�hr�1F��P�?�b�������������(5����%����N �OJ���QHO�����=��|�[��F9���z�����g���`������(<��L����	���'-�7��?�~�+H�l]�J-5,�7�u�do5�R�#�|?K��b�g�uS�X��W��4�~�������D�r�
MQ��2B'���o0�EOp2=]�%�T	o�]ZY�P��Wef	)��
]��V"�!����Q_��
j�f6U��@�yel���jPf+����7��m��HS�~w*����U��d�4��&��<@�X�VYZ�
J*�v���b��9�����K3b�O��w�0�R�=G�R��P����!f�N��&-%0O��p�^��+��1���
v3g��\������'��UQ���9�7pO_UQ�����S�7��������`��T�?�����U>$��.�����^��i��^�����N�4�z��k�����f1��e�����F'O�kh�7�X�&��/4��D�iv�C�ey?���k��s�f>�J�Z"����$���JEHS�.���_$e���d��}5�Q�\a������K��;!�G@
����twv�Y:G��������h<U'��l��~'c��g>���{*�&��������3�c��P ��$qXc�bX������c^�Q�p�YN%�����'NL$�F���J#c�gD�ce!l���?W����r�'r��-�'6�I�@�y����Ny-��k���w[��K��v�)���IX$
�:�S�x�R:�$5���O"�(=R&�Pd��*��Y���W^�{����L�O^��UR4(�p|�������Eo���t��3�;E/b���;TD�%9 ��u��i&��!
������~S�\�?�]C�A9�@����]��E��:�WBy�.��`w�DG4����,�����o�%5�5�0�D��Ux�8Aba�E.YA��'�/W�g���w� ~3fKk\v7�������m�Q�1��e�S*y#�N����������!!E�,E-B��*l���q-�U���%_r!!?34���i���{%0������z�cN������
�m�B�nu����r6K.Lj�ujo�L?73}�e�b��|��'�t�E	�@lO��Ru���tx����{��by����F�cI�z�V��l�9���-Q3D��G���I	��PqEph���������7ad���Z�b>�,U�8��
����w0����������l�.��R�U���$��l�j�X���-�!�H�����8j��`oe����):��q�C�D"�M��w�u��h��-���B�M��RIoi�	�g ~�����K��
x+F��r:��'�2��5�)gS��0Y�]�=��p���"~5[����7�<��"�J��=}���Gw:���x�&a��|X�{�3����J��Uym����^��f*�8���(�_DF\����[�/��"��1.#�kT�8���Cr�"��#��1��%�sl	��
��1�9��,�1oa�t�Z��
Z���	�6�D�m��k�	�����8����b5���R.�o;V�J3�Ua��u��3���0�����J��w�k�gq<E_�C:+�b�����2Y'�n)F6/l��}l��Q����.��]Qx�q'�+��rf����oY��h�?.)���~������~�9�.�V��n��� s������:�7={s.]���K���5[oS.�z;�2��h�Pn\>���Y^}^.0|��2�eKJ/�>6�w��@,�Wf�5��.1�����������S���*��TY��w�������?U��������H�;��M��5>����Q;h�]�n"f���=�Qo��f���l���������6_��Z�<�o���H��F���b��R�)*U����r�caN
#F������G)>����D��Y~Fr�\������Au�j�K��Q�jt$Iv�+`m�?�u�Eo(E3�N�����wF�5�@��������p��������oq�������P�&f5���9�*�(=�k����C�g�M]+���^�b��3^��U��0��oF�Nb���4Of�I��n�|�R�o�w���7Q��:VD�E����a�V��3ju6j����*�����I~(N��Gxz��7������E]G����pT����NgHc;�,L�N3q^I�D\{L�������@�$T#������{��Y��"�Jn^��Q�3��B*�)����^��������2�4�f��	�����E�]��wY�x�����M&�}{p~q��%�>2^7�qL7������p��Q�e��$�f!gI-�Z��z����/�U��c8��$�H��}8��������}������
.J�+v�#%J�n��_���+"���m{�x��G<Vo#1��%�x1����e����1r�Dx_R��{�t/=
�/���b����h��Z��x~=��.I=�`$G�
��)��JAc��#Mo���)�27 ��.�yH�m�+o�������	��&Z���H�P��sE�m=Wx,�I�NWs�hly�3evj��m$��+
���E�$������B�6V�1�@:��?l�:I?A��@�6������g��Sa�`o�w��x��w+9S���_��7X��>C��<�3����I��m���?Z$&�����Di����3zD�<�����g���7'�������+���'}%*�������]����QO!���mC2�
��A,��vyM��,�n��3��\
���/����y7%K)|X�So����4��Wa����0U��q��a�E����$
���:H0��N(l���;���8�����_��	�6P�;�9�%���B�?)����G�+Zi��������g���4=��QH��q7�W�V6zy������Q[�����D,����5$�g �����
�fy��u��b��c���N�	^	����sk�V����"������i����D9.}��t��B�*���o�%w�u��j�F�������lj��������prr����%�t�bB�����������"����7�G��mH"F
M��{~��v���B��he��aD�&�nw����5E�MT��4����HpE�������f��5"�}(@bK��9'd�<�1����Mc�����D��[��s�d���}e���t>���*�CB�L����
\[�}��5�f����a�(	���f8��	Q��s#0��� �\�>Q��������`������k��=j��p�P����'����m2�����;�,E=��|�)����i5��+�5R��b��Q^�Dk������b��X��w����kD�����D8�_��P��v�s�71��v�@-��+u�h��E�FH��.�U0�LH��UJ�����:��/�~�����74� ,D���2���9�� \]*�@���5DY|�`c��!�1�}�~��Y6��1���C��������Bj&���l9��N+�G��={���6�W'��`��G ���2i�H�n9���~DR(���4��`y�b�z�W Zf��K�>&�T�	:�g�����?0��D��~�!�EwJ����:���c�X!������H�G	18�\�S�V���+[�@���D����i�l?A�d����!OIe��xC2g�@&�y������gg��]��N��/0��������;Hzw�+�p�Z�z��������6�c����X��r5�55R3��0!�'����bye��%1��|<5�:���H0��$S(�������c\qb��s|����@y����@I����d��!�%r��rq�������������
���b�tY2����!�`�<�e�^�<5�8��M��������8�I'c������q��up��o�O���M$�Np	�n��vc��]�y���-Y?X���ir7]��`���w�vF���j���}/(/���FK��S���C�.��<�(�
�������g�����E$9C
�>��
���{���D���h����qk��pT��5���V2�#��\z(n;V�� ����5��p9�O���=�N��������D�,��E"�q�W�\?�����T�$�����p�!��S�9YF�l�r*�J��TF��3|��1]���w��\����?F>
��"�N~�$�-��FA�|G8P�!9�4tM����xwJ&wt�!(��!k?vI��k�
]�\��^�����l3�Z���
���g���5����
��^�0���&
���U]U{>��I����l�jV7D�E^
���}o8x��������^��7k5�5��5H_Ym���F)���l����J���7B��?������F���?���1����F��06(��/�>aU�`�3�m&��C_���e4���C�����2B#����Pb��T@b��� ZY'������������=mg�CO���_���/���|����|�[����I������Q+0���2���*���)����\���Y�0���i��D�++�p��0:���BC�����|�'����	�|
=�����+�^n�+ �b��8� ���$���7k��
�ne�^#yl�Xn��q��)*�f��Rh5/�n��~P<�f����n6�c��2������r���$"��|�dI�`�%1�AT�yA3��0�$�#���;'�}���R�!���C��&7�-�=�������Q���;�:*W��^l�"C��9ya/�s���jV?��N��+���VS�������5W�L�����@m��;}q�� �9&��|#P���c:H)7
���d���`>��n�$�pr��8�{��� �xQ�7^M��:���UL--�ZzB
1�C||�s7XGP8�Wx�}����2=<���8��0]On
����g���J�nm���?j�j�����k��6��f��8���%�c�7$�����p��~/�/,�k@���������p{4���9����Y����uK�b�u6�m_�����[A�?Z���r�D�_�n6���}��������Z�f0
�~��:N��v�v��g�i�^{�����k�p��v^�_����b�\��h�8���M����Q���)/)�`UN��I��������[�mw�Hd���v=���B�!��0� 9��n����q��.E������Z���"�?gP�L��Kq����.���g�+s����b�D!�O��d��P����r��66
��;Z~�T�5`Q�p6�������/�$>s�h�;��x`�������������;Ty\��s�P_W�t�x���m�`�+u������S���Zr����^m��K�J|-������K�W���fi���{���w�����[������-�������7Q����H#:�qm1�#�YH\{���	����/�BJ0���_�*G���
MZ�h3P�RU0[�Q�-��.
�ONr+Jo�d��,KeM������s��������Q '���
1��u��eJ/M����3G\V�K���x��m;�g4��qq��#kfh~r|@v�-�:r���i}�W�
A����`@�6:n���e����4F9u��.���'�gA��l���?F�'�(�2r�c\lK��.���������6�I%|a��R68*�Bj��%\����!:���cN��	�0�3�;u��:�!��o|y
� 	���b�Cw*����3���m�B��v'��&����������T'Bb��b
���B�,���N�1�����(YEH!�3�����B0O*TRtFNn����X��!�K���@m���'�R�Q�F{�����&%D��X�I����?d�	��MQ|��J��w���rQ)w����'OM�����y��r��n�Fm���]����N���L'�l�J�(G^���#q	�z��w��b44��}�f/�~4�4�KQ�}�&���n2;#���^�,�u02��d�c���#���:?�n- �tUp�	M�\����Z-x����$x_���gH:@�.5-;ZJ�`�'���8rv$ywn���>����tF��`a�9��c�/u$l�Z������&8>�Y�����w�5�����c��)4�AX�����a�{��7��7��5��4�y�����$A��_J����_$v��+\��O��J�?t�j����9��IF�
-r l��w����H���Z�.�;�0�.�T��KWe�2mn�6 ������i���[�S�w��s��0s�?�B,UP6���2�4�+g�i�:_��<�[��k�����m,�f��3�
�3K�yB��������������1+~g���h����mR����>>uvh����-Q�<��Nc�5%�B�����,.g�Ua����a����p�U�8��q&	���w�.<���|�ei�L`n�X(yR��i:_Ys�}T'��%P���1�ie��I�n��Si��"_�����-���a������U:�tp�����w�N��Rw����]��h����lh]NfC`�y��"�������R��q�pE��7o��r�qyj���lu��=�D���I���i��O(���a:7o�'ZQ��>��vcS�a��Og�I���$�92�%�����9UH�9�7���7��@{=�o��`F&S�I���e������?>����q.^!&��v$��/N�;��55�;ipHw.��b��2{��^�5t��hR	����3|�'�8����E�q��������}S��^���S'�#����p����xC5a�A��7�N����R�ww��3��L�`1�hJ�D���i�`^����}�6W���\a�j=��f�x���������nb���8�Sf�����I]�*��]�6E���Z�8��5e�Z%��{�j�c���?r��
��{5:�����a�t��G�s�^3[D��0�@���L��n�H"���b�R>~y��dWh�l�7���W�l�1�Lh�U.����m��^��W��*.���:/�8h2���r�2��+���#HK��B����:NGNZ����Ay{4�C�Z��J'��&{�Yl�����V�!$���^�o��e�>-�Q�f�e<�
������LG��4�
�[���N����t�
�m6��#Iy!���JlK!���zS�,�,U:���������t�|*�D�-�Vq��%l����x4�_��4�d��5u���3W��z��4����u��"�������:��*P8b�Q���������n3�)-�����������B�R��W�8�Iu	��[����O&e��A�������W�����d��au����n����11����h�t���`Lo]�������G�����_U�j��������"�4;��O�,M������]>����0�;p�y�P�u��)_�u��c���|��������s��
���"��CjS��f��������w">#�jn����#����O�3wqiW��/�\^I���o�q�D�K��+~�Z���Z��Z:�e]sofX����<*���T�����v�z2��U�eV���J�M9��RdUG��z������ �?��w��h5���7}���5j�\����M���V�55G#�i5���a�����W��n�h�/�i���n��7RlE�����A���$ki��p��HdG�3�V�;�m��Jj�������I��Z���]2t�P�<��@ �f�����q����{_=��?C3 ��i�&����aA���
Hy�������!���)+��^��������wB`4�o�tmC���Rz~IU@)�S��M]���()�SJ��6��Pu\j���a�G���,/2�����fse>��h���[��X3�E�������JL{�3K@
�!}��#I	&_��x��?>HN�T&��OVA#u���d�����v�$H��-l�C ]T�z�M�`�N���d��|�l�P>�����Aj�5\Rv��B��
�sb�����5_�/�F�#�!�B"��e_#)����R����He��0��d��Q��T���E8�V� w4|$y�b�x���D���A@�"�����"������	f���L(��`6�8��u'�
���[���J}<���	0E	
��fl&Ur���snq���&�"`.
:@h���'��1�)tp�Wf���^�����PX���>1�y`����q!��7i�����������������������������������������������FT��0
#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#10)
1 attachment(s)
Re: Hot Standby on git

Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

Further fixes extracted from above repository attached..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

hs-riggs-branch-20090928.tar.gzapplication/x-gzip; name=hs-riggs-branch-20090928.tar.gzDownload
#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git

Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git

Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquired first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git
Regarding this in InitStandbyDelayTimers:
+   /*
+    * If replication delay is enormously huge, just treat that as
+    * zero and work up from there. This prevents us from acting
+    * foolishly when replaying old log files.
+    */
+   if (*currentDelay_ms < 0)
+       *currentDelay_ms = 0;
+

So we're treating restoring from an old backup the same as an up-to-date
standby server. If you're restoring from say a month old base backup
with WAL archive up to present day, and have max_standby_delay set to
say 5 seconds, the server will wait for that 5 seconds on each
conflicting query before killing it. Until it reaches the point in the
archive where the delay is less than INT_MAX/1000 seconds old: at that
point it switches into "oh my goodness, we've fallen badly behind, let's
try to catch up ASAP and kill any queries that get into the way" mode.
That's pretty surprising behavior, and not documented either. I propose
we simply remove the above check (fixing the rest of the code so that
you don't hit integer overflows), and always respect max_standby_delay.

BTW, I wonder if should warn or something if we find that the timestamps
in the archive are in the future? IOW, if either the master's or the
standby's clock is not set correctly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git
+   /*
+    * If our initial RunningXactData had an overflowed snapshot then we
+    * knew we were missing some subxids from our snapshot. We can use
+    * this data as an initial snapshot, but we cannot yet mark it valid.
+    * We know that the missing subxids are equal to or earlier than
+    * LatestRunningXid. After we initialise we continue to apply changes
+    * during recovery, so once the oldestRunningXid is later than the
+    * initLatestRunningXid we can now prove that we no longer have
+    * missing information and can mark the snapshot as valid.
+    */
+   if (initRunningXactData && !recoverySnapshotValid)
+   {
+       if (TransactionIdPrecedes(initLatestRunningXid,
xlrec->oldestRunningXid)
+       {
+           recoverySnapshotValid = true;
+           elog(trace_recovery(DEBUG2),
+                   "running xact data now proven complete");
+           elog(trace_recovery(DEBUG2),
+                   "recovery snapshots are now enabled");
+       }
+       return;
+   }
+

When GetRunningXactData() calculates latestRunningXid in the master,
which is stored in initLatestRunningXid in the standby, it only looks at
xids and subxids present in the procarray. It doesn't take into account
overflowed subxids. I think we could declare a recovery snapshot "proven
complete" too early because of that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#14)
Re: Hot Standby on git

On Wed, 2009-09-30 at 18:45 +0300, Heikki Linnakangas wrote:

Regarding this in InitStandbyDelayTimers:
+   /*
+    * If replication delay is enormously huge, just treat that as
+    * zero and work up from there. This prevents us from acting
+    * foolishly when replaying old log files.
+    */
+   if (*currentDelay_ms < 0)
+       *currentDelay_ms = 0;
+

So we're treating restoring from an old backup the same as an up-to-date
standby server. If you're restoring from say a month old base backup
with WAL archive up to present day, and have max_standby_delay set to
say 5 seconds, the server will wait for that 5 seconds on each
conflicting query before killing it. Until it reaches the point in the
archive where the delay is less than INT_MAX/1000 seconds old: at that
point it switches into "oh my goodness, we've fallen badly behind, let's
try to catch up ASAP and kill any queries that get into the way" mode.
That's pretty surprising behavior, and not documented either. I propose
we simply remove the above check (fixing the rest of the code so that
you don't hit integer overflows), and always respect max_standby_delay.

Agreed.

I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).

BTW, I wonder if should warn or something if we find that the timestamps
in the archive are in the future? IOW, if either the master's or the
standby's clock is not set correctly.

Something similar was just spotted by a client. You can set a
recovery_target_timestamp that is before the pg_stop_recovery()
timestamp and it doesn't complain. Will fix.

Not sure if I like the sound of a system moaning at me about the clock
settings. Perhaps just once when it starts, when we read control file.

--
Simon Riggs www.2ndQuadrant.com

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#15)
Re: Hot Standby on git

On Thu, 2009-10-01 at 14:29 +0300, Heikki Linnakangas wrote:

+   /*
+    * If our initial RunningXactData had an overflowed snapshot then we
+    * knew we were missing some subxids from our snapshot. We can use
+    * this data as an initial snapshot, but we cannot yet mark it valid.
+    * We know that the missing subxids are equal to or earlier than
+    * LatestRunningXid. After we initialise we continue to apply changes
+    * during recovery, so once the oldestRunningXid is later than the
+    * initLatestRunningXid we can now prove that we no longer have
+    * missing information and can mark the snapshot as valid.
+    */
+   if (initRunningXactData && !recoverySnapshotValid)
+   {
+       if (TransactionIdPrecedes(initLatestRunningXid,
xlrec->oldestRunningXid)
+       {
+           recoverySnapshotValid = true;
+           elog(trace_recovery(DEBUG2),
+                   "running xact data now proven complete");
+           elog(trace_recovery(DEBUG2),
+                   "recovery snapshots are now enabled");
+       }
+       return;
+   }
+

When GetRunningXactData() calculates latestRunningXid in the master,
which is stored in initLatestRunningXid in the standby, it only looks at
xids and subxids present in the procarray. It doesn't take into account
overflowed subxids. I think we could declare a recovery snapshot "proven
complete" too early because of that.

Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.

--
Simon Riggs www.2ndQuadrant.com

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#12)
Re: Hot Standby on git

On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:

Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

I think those changes are just paranoia from early versions of patch.

We now know for certain that MultiXact isn't used at the time
StartupMultiXact() is called. The same isn't true for StartupClog() and
StartupSubtrans() which need to cope with concurrent callers.

Will remove changes and document that nothing touching it when it runs.

--
Simon Riggs www.2ndQuadrant.com

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#18)
1 attachment(s)
Re: Hot Standby on git

Simon Riggs wrote:

On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:

Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.

I think those changes are just paranoia from early versions of patch.

We now know for certain that MultiXact isn't used at the time
StartupMultiXact() is called. The same isn't true for StartupClog() and
StartupSubtrans() which need to cope with concurrent callers.

Will remove changes and document that nothing touching it when it runs.

Thanks, I reverted that in my working version already. Comment patch
welcome if you feel it's needed.

Attached is a new batch of changes I've been doing since last batch.
These are again extracted from my git repository. It includes some of
the add-on patches from your repository, the rest I believe have I had
already did myself earlier, or are not necessary anymore for other reasons.

Could you look into these two TODO items you listed on the wiki page:
- Correct SET default_transaction_read_only and SET
transaction_read_only (Heikki 21/9 Hackers)
- Shutdown checkpoints must not clear locks for prepared transactions
(Heikki 23/9)

And if you could please review the changes I've been doing, just to make
sure I haven't inadvertently introduced new bugs. That has happened
before, as you've rightfully reminded me :-).

There's also the issue that you can't go into hot standby mode after a
shutdown checkpoint. I think that really should be fixed, it's just
weird from a usability point of view if it doesn't work.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

hs-riggs-branch-20091001.tar.gzapplication/x-gzip; name=hs-riggs-branch-20091001.tar.gzDownload
#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#17)
Re: Hot Standby on git

Simon Riggs wrote:

Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.

Ok, good, that's what I thought too. I'll fix that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#19)
Re: Hot Standby on git

On Thu, 2009-10-01 at 18:47 +0300, Heikki Linnakangas wrote:

Could you look into these two TODO items you listed on the wiki page:

Unless we agree otherwise, if its listed on the Wiki page then I will
work on it.

Maybe not as when you might like it, but I am working through the list.
5 new changes pushed just minutes ago, sans full testing.

Yes, will review your changes also.

--
Simon Riggs www.2ndQuadrant.com

#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#20)
Re: Hot Standby on git

On Thu, 2009-10-01 at 18:48 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.

Ok, good, that's what I thought too. I'll fix that.

OK, not working on that, so go ahead. Thanks.

--
Simon Riggs www.2ndQuadrant.com

#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#21)
Re: Hot Standby on git

Simon Riggs wrote:

@@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
else
{
/*
+        * Take a snapshot of running transactions and write this to WAL.
+        * This allows us to reconstruct the state of running transactions
+        * during archive recovery, if required. We do this even if we are
+        * not archiving, to allow a cold physical backup of the server to
+        * be useful as a read only standby.
+        */
+       GetRunningTransactionData();
+
+       /*
* If archiving is enabled, rotate the last XLOG file so that all the
* remaining records are archived (postmaster wakes up the archiver
* process one more time at the end of shutdown). The checkpoint

I don't think this will do any good where it's placed. The checkpoint
that follows will have its redo-pointer beyond the running-xacts record,
so WAL replay will never see it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#16)
Re: Hot Standby on git

Simon Riggs wrote:

I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).

Hmm, not sure if that's such a good piece of advice either. It will mean
waiting for queries forever, which probably isn't what you want if
you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
depends on the situation...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#23)
Re: Hot Standby on git

On Fri, 2009-10-02 at 10:04 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

@@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
else
{
/*
+        * Take a snapshot of running transactions and write this to WAL.
+        * This allows us to reconstruct the state of running transactions
+        * during archive recovery, if required. We do this even if we are
+        * not archiving, to allow a cold physical backup of the server to
+        * be useful as a read only standby.
+        */
+       GetRunningTransactionData();
+
+       /*
* If archiving is enabled, rotate the last XLOG file so that all the
* remaining records are archived (postmaster wakes up the archiver
* process one more time at the end of shutdown). The checkpoint

I don't think this will do any good where it's placed. The checkpoint
that follows will have its redo-pointer beyond the running-xacts record,
so WAL replay will never see it.

Perhaps we need two entries then to cover multiple use cases?

The placement of this was specifically chosen so that it is the last
entry before the log switch, so that the runningxact record would be
archived.

Yes, we also need one after the shutdown checkpoint to cover the case
where the whole data directory is copied after shutdown. The comments
matched the latter case but the position addressed the first case, so it
looks like I was confused as to which case I was addressing.

Have updated code to do both. See what you think. Thanks.

--
Simon Riggs www.2ndQuadrant.com

#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#25)
Re: Hot Standby on git

Simon Riggs wrote:

On Fri, 2009-10-02 at 10:04 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

@@ -7061,6 +7061,15 @@ ShutdownXLOG(int code, Datum arg)
else
{
/*
+        * Take a snapshot of running transactions and write this to WAL.
+        * This allows us to reconstruct the state of running transactions
+        * during archive recovery, if required. We do this even if we are
+        * not archiving, to allow a cold physical backup of the server to
+        * be useful as a read only standby.
+        */
+       GetRunningTransactionData();
+
+       /*
* If archiving is enabled, rotate the last XLOG file so that all the
* remaining records are archived (postmaster wakes up the archiver
* process one more time at the end of shutdown). The checkpoint

I don't think this will do any good where it's placed. The checkpoint
that follows will have its redo-pointer beyond the running-xacts record,
so WAL replay will never see it.

Perhaps we need two entries then to cover multiple use cases?

The placement of this was specifically chosen so that it is the last
entry before the log switch, so that the runningxact record would be
archived.

Yes, we also need one after the shutdown checkpoint to cover the case
where the whole data directory is copied after shutdown. The comments
matched the latter case but the position addressed the first case, so it
looks like I was confused as to which case I was addressing.

Have updated code to do both. See what you think. Thanks.

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#24)
Re: Hot Standby on git

On Fri, 2009-10-02 at 10:32 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).

Hmm, not sure if that's such a good piece of advice either. It will mean
waiting for queries forever, which probably isn't what you want if
you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
depends on the situation...

That assumes that the purpose of the archive recovery is more important
than running queries. As you say, it would mean always waiting. But the
beauty is that you *can* run queries to determine when to stop, so
having them cancelled defeats that purpose.

--
Simon Riggs www.2ndQuadrant.com

#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#26)
Re: Hot Standby on git

On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

I've said its a corner case and not worth spending time on. I'm putting
it in at your request. If it's not correct before and not correct after,
where exactly do you want it?

--
Simon Riggs www.2ndQuadrant.com

#29Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#28)
Re: Hot Standby on git

Simon Riggs wrote:

On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

I've said its a corner case and not worth spending time on. I'm putting
it in at your request. If it's not correct before and not correct after,
where exactly do you want it?

I don't know. Perhaps it should go between the REDO pointer of the
shutdown checkpoint and the checkpoint record itself. Or maybe the
information should be included in the checkpoint record itself.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#29)
Re: Hot Standby on git

On Fri, 2009-10-02 at 11:26 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

I've said its a corner case and not worth spending time on. I'm putting
it in at your request. If it's not correct before and not correct after,
where exactly do you want it?

I don't know. Perhaps it should go between the REDO pointer of the
shutdown checkpoint and the checkpoint record itself.

That would seem minimally invasive approach and would appear to work for
both cases. Will do.

--
Simon Riggs www.2ndQuadrant.com

#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#29)
Re: Hot Standby on git

On Fri, 2009-10-02 at 11:26 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Fri, 2009-10-02 at 10:43 +0300, Heikki Linnakangas wrote:

It seems dangerous to write a WAL record after the shutdown checkpoint.
It will be overwritten by subsequent startup, which is a recipe for trouble.

I've said its a corner case and not worth spending time on. I'm putting
it in at your request. If it's not correct before and not correct after,
where exactly do you want it?

I don't know. Perhaps it should go between the REDO pointer of the
shutdown checkpoint and the checkpoint record itself. Or maybe the
information should be included in the checkpoint record itself.

I've implemented this but it requires us to remove two checks - one at
shutdown and one at startup on a shutdown checkpoint. I'm not happy
doing that and would like to put them back.

I'd rather just skip this for now. It's a minor case anyway and there's
nothing stopping writing their own RunningXactData records with a
function, if it is needed. I can add a function for that.

--
Simon Riggs www.2ndQuadrant.com

#32Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#31)
Re: Hot Standby on git

Simon Riggs wrote:

I'd rather just skip this for now. It's a minor case anyway and there's
nothing stopping writing their own RunningXactData records with a
function, if it is needed. I can add a function for that.

That won't help. There's no way to have it in a right place wrt. the
shutdown checkpoint if it's triggered by a user-callable function.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#32)
Re: Hot Standby on git

On Fri, 2009-10-02 at 13:52 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I'd rather just skip this for now. It's a minor case anyway and there's
nothing stopping writing their own RunningXactData records with a
function, if it is needed. I can add a function for that.

That won't help. There's no way to have it in a right place wrt. the
shutdown checkpoint if it's triggered by a user-callable function.

I notice that you avoid saying "yes, I agree we should remove the two
checks".

I will add code to make a shutdown checkpoint be a valid starting place
for Hot Standby, as long as there are no in-doubt prepared transactions.
That way we know there are no xids still running and no locks, without
needing to write a record to say so.

--
Simon Riggs www.2ndQuadrant.com

#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#33)
Re: Hot Standby on git

Simon Riggs wrote:

I will add code to make a shutdown checkpoint be a valid starting place
for Hot Standby, as long as there are no in-doubt prepared transactions.
That way we know there are no xids still running and no locks, without
needing to write a record to say so.

Ok, I can live with that, and should be dead simple to implement.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#34)
Re: Hot Standby on git

On Fri, 2009-10-02 at 18:04 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I will add code to make a shutdown checkpoint be a valid starting place
for Hot Standby, as long as there are no in-doubt prepared transactions.
That way we know there are no xids still running and no locks, without
needing to write a record to say so.

Ok, I can live with that, and should be dead simple to implement.

First cut changes made, for discussion.

--
Simon Riggs www.2ndQuadrant.com

#36Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#11)
Re: Hot Standby on git

On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:

Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

Further fixes extracted from above repository attached..

I've applied changes on all these patches apart from 0006-... which has
some dependencies on earlier work I'm still looking at.

--
Simon Riggs www.2ndQuadrant.com

#37Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#10)
Re: Hot Standby on git

On Sun, 2009-09-27 at 13:57 +0300, Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

I'm working my way through these changes now. 1, 2, 15 and 16 applied.

We discussed briefly your change
0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

--
Simon Riggs www.2ndQuadrant.com

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#37)
Re: Hot Standby on git

Simon Riggs <simon@2ndQuadrant.com> writes:

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

[ scratches head ... ] Why is hot standby messing with this sort of
thing at all? It sounds like a performance optimization that should
be considered separately, and *later*.

regards, tom lane

#39Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#38)
Re: Hot Standby on git

On Mon, 2009-10-05 at 10:19 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

[ scratches head ... ] Why is hot standby messing with this sort of
thing at all? It sounds like a performance optimization that should
be considered separately, and *later*.

Possibly.

We have 3 suggested approaches:
* Avoid taking LockPartition locks while we get info for Hot Standby
during normal running, by means of a ref counting scheme (Simon)
* Take the locks and implement a ref counting scheme (Heikki)
* Take the locks, worry later (Tom)

The middle ground seems pointless to me.

I'm happy to go with simple lock-everything-for-now but it's pretty
clear its going to be a annoying performance hit. If we do that we
should put in a parameter to turn on/off so that those who will never
use Hot Standby can avoid this completely.

I'll wait for Heikki's thoughts before implementing anything.

--
Simon Riggs www.2ndQuadrant.com

#40Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#37)
Re: Hot Standby on git

Simon Riggs wrote:

We discussed briefly your change
0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

Well, the original code was buggy. But more to the point, it's a lot
simpler this way, I don't see any reason why the counters should be
per-process, meaning that they need to be exposed in the pgproc structs
or procarray.c.

The point is to avoid the seqscan of the lock hash table. I presumed
that's the expensive part in GetRunningTransactionLocks().

Tom Lane wrote:

[ scratches head ... ] Why is hot standby messing with this sort of
thing at all? It sounds like a performance optimization that should
be considered separately, and *later*.

Yeah, I too considered just ripping it out. Simon is worried that
locking all the lock partitions and scanning the locks table can take a
long time. We do that in the master, while holding both ProcArrayLock
and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
there is some grounds for worry. OTOH, it's only done once per checkpoint.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#36)
Re: Hot Standby on git

Simon Riggs wrote:

On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:

Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

Further fixes extracted from above repository attached..

I've applied changes on all these patches apart from 0006-... which has
some dependencies on earlier work I'm still looking at.

Simon, you don't need to apply those patches. Just review them, and post
comments or subsequent patches on top of the repository.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#41)
Re: Hot Standby on git

On Mon, 2009-10-05 at 18:30 -0400, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Mon, 2009-09-28 at 11:25 +0300, Heikki Linnakangas wrote:

Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

Further fixes extracted from above repository attached..

I've applied changes on all these patches apart from 0006-... which has
some dependencies on earlier work I'm still looking at.

Simon, you don't need to apply those patches. Just review them, and post
comments or subsequent patches on top of the repository.

I've applied them to the repository.

--
Simon Riggs www.2ndQuadrant.com

#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#40)
Re: Hot Standby on git

On Tue, 2009-10-06 at 01:10 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

We discussed briefly your change
0011-Replace-per-proc-counters-of-loggable-locks-with-per.patch.

I don't see how that helps at all. The objective of lock counters was to
know if we can skip acquiring an LWlock on all lock partitions. This
change keeps the lock counters yet acquires the locks we were trying to
avoid. This change needs some justification since it is not a bug fix.

Well, the original code was buggy.

One bug was found, yes, but the submitted changes are not just a bug
fix, they alter the whole basic meaning of that code.

But more to the point, it's a lot
simpler this way, I don't see any reason why the counters should be
per-process, meaning that they need to be exposed in the pgproc structs
or procarray.c.

There was a very clear reason for doing it that way. By putting the
counters on the PGPROC structs it allows us to check the counts while we
are performing the main sweep of the procarray *and* if the count is
zero it allows us to *completely avoid* taking the locks. By making the
lock counters private to each backend the counters themselves do not
need to be protected by a lock when updated, and the fetch can be done
atomically, as we do with xid.

With respect, I have explained this on list at least twice previously,
as well as in code comments.

The point is to avoid the seqscan of the lock hash table. I presumed
that's the expensive part in GetRunningTransactionLocks().

Tom Lane wrote:

[ scratches head ... ] Why is hot standby messing with this sort of
thing at all? It sounds like a performance optimization that should
be considered separately, and *later*.

Yeah, I too considered just ripping it out. Simon is worried that
locking all the lock partitions and scanning the locks table can take a
long time. We do that in the master, while holding both ProcArrayLock
and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
there is some grounds for worry. OTOH, it's only done once per checkpoint.

I could live with ripping it out, but what we have now doesn't make
sense, to me.

--
Simon Riggs www.2ndQuadrant.com

#44Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#10)
Re: Hot Standby on git

On Sun, 2009-09-27 at 13:57 +0300, Heikki Linnakangas wrote:

Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.

There were 16 change patches included in this post. I have applied 14 of
them, almost all without any changes to comment or code. I've fixed 2
bugs and made changes where XXX comments were left in code. That leaves
0011-locks... discussed on another part of this thread and patch
0005-Include-information... which I'm discussing here.

It's a huge set of changes, all of which is just refactoring, none of
which is a bug fix of any kind. The refactoring does sound reasonable,
but is really fairly minor. I feel we should defer this change for the
future to allow us to stabilise the current patch. I do understand the
need for refactoring, but if we refactor everything touched by Hot
Standby then we will simply touch more code and trigger more bugs etc..
This is especially true with prepared transactions, which aren't well
understood and not covered by as many tests.

---
src/backend/access/transam/twophase.c | 96
+++++++++++++++++-----------
src/backend/access/transam/twophase_rmgr.c | 13 ----
src/backend/access/transam/xact.c | 64 +++++++------------
src/backend/access/transam/xlog.c | 5 +-
src/backend/commands/discard.c | 3 +-
src/backend/commands/sequence.c | 3 +
src/backend/storage/lmgr/lock.c | 5 +-
src/backend/tcop/postgres.c | 1 +
src/backend/utils/cache/inval.c | 90
++------------------------
src/include/access/subtrans.h | 3 -
src/include/access/twophase.h | 2 -
src/include/access/twophase_rmgr.h | 6 +-
src/include/access/xact.h | 6 --
src/include/miscadmin.h | 7 --
src/include/storage/proc.h | 7 +-
src/include/utils/inval.h | 7 ++
16 files changed, 108 insertions(+), 210 deletions(-)

--
Simon Riggs www.2ndQuadrant.com

#45Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#19)
2 attachment(s)
Re: Hot Standby on git

On Thu, 2009-10-01 at 18:47 +0300, Heikki Linnakangas wrote:

And if you could please review the changes I've been doing, just to
make sure I haven't inadvertently introduced new bugs. That has
happened before, as you've rightfully reminded me :-).

You posted 17 patches here.

I've reviewed/applied patches 1,3,4,5,6,7,9,10,13,14,15.
That leaves me with some form of issue on 2, 5, 8, 11, 12, 16 and 17.
Sounds a lot, but out of 41 patches in total to date, I have as yet
unresolved issues with 9.

Patch 0017 has significant changes to it, so I'm posting patches here
for further discussion. Main line thought is that I agree with the
changes you wanted to make and I've added a few extra things.

Commit message from repo:
Apply 0017-Revert-changes-to-subtrans.c-and-slru.c.-Instead-cal.patch
but with heavy modifications to fix a number of bugs and make associated
changes. First, StartupSubtrans() positioned itself at oldestXid, so
that when later running transactions complete they could find no page
for them to update and crash. Second, ExtendClog() expected to be able
to write WAL during recovery and so crashed after 32768 xids. This patch
also extends the patch to cover the recently added support for starting
Hot Standby from a shutdown checkpoint, which causes some refactoring.
Various comments reworded, including allowing a lock overflow to cause a
PENDING state, just as we do with subxid overflow. Another bug was also
found, in that failing to make subtrans entries from the initial
snapshot could lead to later abort records hanging because the topxid
was not set. Code is now similar in all code paths. Sounds like a lot of
changes, but mostly subtle changes rather than lengthy ones.

It seems highly likely that you'll find an error in my changes to your
changes also, but they do pass initial testing.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

90480cc3d7cbdaf08c2df11a1bc16ac2d8f0dce1.context.patchtext/x-patch; charset=utf-8; name=90480cc3d7cbdaf08c2df11a1bc16ac2d8f0dce1.context.patchDownload
*** a/src/backend/access/transam/clog.c
--- b/src/backend/access/transam/clog.c
***************
*** 575,581 **** ExtendCLOG(TransactionId newestXact)
  	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
  
  	/* Zero the page and make an XLOG entry about it */
! 	ZeroCLOGPage(pageno, true);
  
  	LWLockRelease(CLogControlLock);
  }
--- 575,581 ----
  	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
  
  	/* Zero the page and make an XLOG entry about it */
! 	ZeroCLOGPage(pageno, !InRecovery);
  
  	LWLockRelease(CLogControlLock);
  }
*** a/src/backend/access/transam/slru.c
--- b/src/backend/access/transam/slru.c
***************
*** 598,605 **** SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
  	 * commands to set the commit status of transactions whose bits are in
  	 * already-truncated segments of the commit log (see notes in
  	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
! 	 * where the file doesn't exist, and return zeroes instead. We also
! 	 * return a zeroed page when seek and read fails.
  	 */
  	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
  	if (fd < 0)
--- 598,604 ----
  	 * commands to set the commit status of transactions whose bits are in
  	 * already-truncated segments of the commit log (see notes in
  	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
! 	 * where the file doesn't exist, and return zeroes instead.
  	 */
  	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
  	if (fd < 0)
***************
*** 620,633 **** SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
  
  	if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
  	{
- 		if (InRecovery)
- 		{
- 			ereport(LOG,
- 					(errmsg("file \"%s\" doesn't exist, reading as zeroes",
- 							path)));
- 			MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
- 			return true;
- 		}
  		slru_errcause = SLRU_SEEK_FAILED;
  		slru_errno = errno;
  		close(fd);
--- 619,624 ----
***************
*** 637,650 **** SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
  	errno = 0;
  	if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
  	{
- 		if (InRecovery)
- 		{
- 			ereport(LOG,
- 					(errmsg("file \"%s\" doesn't exist, reading as zeroes",
- 							path)));
- 			MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
- 			return true;
- 		}
  		slru_errcause = SLRU_READ_FAILED;
  		slru_errno = errno;
  		close(fd);
--- 628,633 ----
*** a/src/backend/access/transam/subtrans.c
--- b/src/backend/access/transam/subtrans.c
***************
*** 31,37 ****
  #include "access/slru.h"
  #include "access/subtrans.h"
  #include "access/transam.h"
- #include "miscadmin.h"
  #include "pg_trace.h"
  #include "utils/snapmgr.h"
  
--- 31,36 ----
***************
*** 45,52 ****
   * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
   * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE.  We need take no
   * explicit notice of that fact in this module, except when comparing segment
!  * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes)
!  * and in recovery when we do ExtendSUBTRANS.
   */
  
  /* We need four bytes per xact */
--- 44,50 ----
   * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
   * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE.  We need take no
   * explicit notice of that fact in this module, except when comparing segment
!  * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes).
   */
  
  /* We need four bytes per xact */
***************
*** 85,96 **** SubTransSetParent(TransactionId xid, TransactionId parent)
  	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
  	ptr += entryno;
  
! 	/*
! 	 * Current state should be 0, except in recovery where we may
! 	 * need to reset the value multiple times
! 	 */
! 	Assert(*ptr == InvalidTransactionId ||
! 			(InRecovery && *ptr == parent));
  
  	*ptr = parent;
  
--- 83,90 ----
  	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
  	ptr += entryno;
  
! 	/* Current state should be 0 */
! 	Assert(*ptr == InvalidTransactionId);
  
  	*ptr = parent;
  
***************
*** 229,247 **** ZeroSUBTRANSPage(int pageno)
  /*
   * This must be called ONCE during postmaster or standalone-backend startup,
   * after StartupXLOG has initialized ShmemVariableCache->nextXid.
   */
  void
  StartupSUBTRANS(TransactionId oldestActiveXID)
  {
! 	TransactionId xid = ShmemVariableCache->nextXid;
! 	int			pageno = TransactionIdToPage(xid);
! 
! 	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
  
  	/*
! 	 * Initialize our idea of the latest page number.
  	 */
! 	SubTransCtl->shared->latest_page_number = pageno;
  
  	LWLockRelease(SubtransControlLock);
  }
--- 223,255 ----
  /*
   * This must be called ONCE during postmaster or standalone-backend startup,
   * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+  *
+  * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
+  * if there are none.
   */
  void
  StartupSUBTRANS(TransactionId oldestActiveXID)
  {
! 	int			startPage;
! 	int			endPage;
  
  	/*
! 	 * Since we don't expect pg_subtrans to be valid across crashes, we
! 	 * initialize the currently-active page(s) to zeroes during startup.
! 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
! 	 * the new page without regard to whatever was previously on disk.
  	 */
! 	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
! 
! 	startPage = TransactionIdToPage(oldestActiveXID);
! 	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
! 
! 	while (startPage != endPage)
! 	{
! 		(void) ZeroSUBTRANSPage(startPage);
! 		startPage++;
! 	}
! 	(void) ZeroSUBTRANSPage(startPage);
  
  	LWLockRelease(SubtransControlLock);
  }
***************
*** 294,335 **** void
  ExtendSUBTRANS(TransactionId newestXact)
  {
  	int			pageno;
- 	static int last_pageno = 0;
  
! 	Assert(TransactionIdIsNormal(newestXact));
  
! 	if (!InRecovery)
! 	{
! 		/*
! 		 * No work except at first XID of a page.  But beware: just after
! 		 * wraparound, the first XID of page zero is FirstNormalTransactionId.
! 		 */
! 		if (TransactionIdToEntry(newestXact) != 0 &&
! 			!TransactionIdEquals(newestXact, FirstNormalTransactionId))
! 			return;
! 
! 		pageno = TransactionIdToPage(newestXact);
! 	}
! 	else
! 	{
! 		/*
! 		 * InRecovery we keep track of the last page we extended, so
! 		 * we can compare that against incoming XIDs. This will only
! 		 * ever be run by startup process, so keep it as a static variable
! 		 * rather than hiding behind the SubtransControlLock.
! 		 */
! 		pageno = TransactionIdToPage(newestXact);
! 
! 		if (pageno == last_pageno ||
! 			SubTransPagePrecedes(pageno, last_pageno))
! 			return;
! 
! 		ereport(trace_recovery(DEBUG1),
! 				(errmsg("extend subtrans  xid %u page %d last_page %d",
! 						newestXact, pageno, last_pageno)));
! 
! 		last_pageno = pageno;
! 	}
  
  	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
  
--- 302,317 ----
  ExtendSUBTRANS(TransactionId newestXact)
  {
  	int			pageno;
  
! 	/*
! 	 * No work except at first XID of a page.  But beware: just after
! 	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
! 	 */
! 	if (TransactionIdToEntry(newestXact) != 0 &&
! 		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
! 		return;
  
! 	pageno = TransactionIdToPage(newestXact);
  
  	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
  
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 5994,5999 **** StartupXLOG(void)
--- 5994,6000 ----
  	uint32		freespace;
  	TransactionId oldestActiveXID;
  	bool		bgwriterLaunched = false;
+ 	bool		backendsAllowed = false;
  
  	/*
  	 * Read control file and check XLOG status looks valid.
***************
*** 6319,6325 **** StartupXLOG(void)
  			bool		recoveryContinue = true;
  			bool		recoveryApply = true;
  			bool		reachedMinRecoveryPoint = false;
- 			bool		backendsAllowed = false;
  			ErrorContextCallback errcontext;
  
  			/* use volatile pointer to prevent code rearrangement */
--- 6320,6325 ----
***************
*** 6689,6699 **** StartupXLOG(void)
  	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
  	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
  
! 	/* Start up the commit log and related stuff, too */
! 	/* XXXHS: perhaps this should go after XactClearRecoveryTransactions */
! 	StartupCLOG();
! 	StartupSUBTRANS(oldestActiveXID);
! 	StartupMultiXact();
  
  	/* Reload shared-memory state for prepared transactions */
  	RecoverPreparedTransactions();
--- 6689,6701 ----
  	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
  	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
  
! 	/* Start up the commit log and related stuff, too, if not done already */
! 	if (!backendsAllowed)
! 	{
! 		StartupCLOG();
! 		StartupSUBTRANS(oldestActiveXID);
! 		StartupMultiXact();
! 	}
  
  	/* Reload shared-memory state for prepared transactions */
  	RecoverPreparedTransactions();
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 45,50 ****
--- 45,52 ----
  
  #include <signal.h>
  
+ #include "access/clog.h"
+ #include "access/multixact.h"
  #include "access/subtrans.h"
  #include "access/transam.h"
  #include "access/xlog.h"
***************
*** 129,134 **** static void DisplayXidCache(void);
--- 131,139 ----
  #define xc_slow_answer_inc()		((void) 0)
  #endif   /* XIDCACHE_DEBUG */
  
+ static void RecoverySnapshotStateMachine(int newstate,
+ 							 TransactionId oldestXid, TransactionId latestXid);
+ 
  /* Primitives for KnownAssignedXids array handling for standby */
  static Size KnownAssignedXidsShmemSize(int size);
  static void KnownAssignedXidsInit(int size);
***************
*** 470,493 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  	{
  		if (TransactionIdPrecedes(recoverySnapshotPendingXmin,
  								  xlrec->oldestRunningXid))
! 		{
! 			recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
! 			elog(trace_recovery(DEBUG2), 
! 					"running xact data now proven complete");
! 			elog(trace_recovery(DEBUG2), 
! 					"recovery snapshots are now enabled");
! 		}
! 		return;
! 	}
! 
! 	/*
! 	 * Can't initialise with an incomplete set of lock information.
! 	 * XXX: Can't we go into pending state like with overflowed subxids?
! 	 */
! 	if (xlrec->lock_overflow)
! 	{
! 		elog(trace_recovery(DEBUG2), 
! 				"running xact data has incomplete lock data");
  		return;
  	}
  
--- 475,483 ----
  	{
  		if (TransactionIdPrecedes(recoverySnapshotPendingXmin,
  								  xlrec->oldestRunningXid))
! 			RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY,
! 										 xlrec->oldestRunningXid,
! 										 xlrec->latestRunningXid);
  		return;
  	}
  
***************
*** 499,514 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  	/*
  	 * If the snapshot overflowed, then we still initialise with what we
  	 * know, but the recovery snapshot isn't fully valid yet because we
! 	 * know there are some subxids missing (ergo we don't know which ones)
  	 */
! 	if (!xlrec->subxid_overflow)
! 		recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
  	else
! 	{
! 		recoverySnapshotState = RECOVERY_SNAPSHOT_PENDING;
! 		ereport(LOG, 
! 				(errmsg("consistent state delayed because recovery snapshot incomplete")));
! 	}
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
--- 489,506 ----
  	/*
  	 * If the snapshot overflowed, then we still initialise with what we
  	 * know, but the recovery snapshot isn't fully valid yet because we
! 	 * know we have information missing. We either have missing subxids
! 	 * or missing locks, doesn't really matter which but which track each
! 	 * separately to help with debugging.
  	 */
! 	if (xlrec->subxid_overflow || xlrec->lock_overflow)
! 		RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_PENDING,
! 										 xlrec->oldestRunningXid,
! 										 xlrec->latestRunningXid);
  	else
! 		RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY,
! 										 xlrec->oldestRunningXid,
! 										 xlrec->latestRunningXid);
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
***************
*** 522,532 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  
  	/*
  	 * Scan through the incoming array of RunningXacts and collect xids.
! 	 * We don't use SubtransSetParent because it doesn't matter yet. If
! 	 * we aren't overflowed then all xids will fit in snapshot and so we
! 	 * don't need subtrans. If we later overflow, an xid assignment record
! 	 * will add xids to subtrans. If RunningXacts is overflowed then we
! 	 * don't have enough information to correctly update subtrans anyway.	
  	 */
  	for (xid_index = 0; xid_index < xlrec->xcnt; xid_index++)
  	{
--- 514,522 ----
  
  	/*
  	 * Scan through the incoming array of RunningXacts and collect xids.
! 	 * We mark SubtransSetParent, just as we would in other cases. That 
! 	 * is OK because we performed StartupSubtrans() when we changed state,
! 	 * above.
  	 */
  	for (xid_index = 0; xid_index < xlrec->xcnt; xid_index++)
  	{
***************
*** 537,544 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  
  		xids[nxids++] = xid;
  		for(i = 0; i < rxact[xid_index].nsubxids; i++)
! 			xids[nxids++] = subxip[rxact[xid_index].subx_offset + i];
! 
  	}
  
  	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
--- 527,537 ----
  
  		xids[nxids++] = xid;
  		for(i = 0; i < rxact[xid_index].nsubxids; i++)
! 		{
! 			TransactionId subxid = subxip[rxact[xid_index].subx_offset + i];
! 			xids[nxids++] = subxid;
! 			SubTransSetParent(subxid, xid);
! 		}
  	}
  
  	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
***************
*** 565,575 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  	loggableLocks = (xl_rel_lock *) &(xlrec->xrun[(xlrec->xcnt + xlrec->subxcnt)]);
  	relation_redo_locks(loggableLocks, xlrec->numLocks);
  
! 	elog(trace_recovery(DEBUG2), 
! 		"running transaction data initialized");
! 	if (recoverySnapshotState == RECOVERY_SNAPSHOT_READY)
! 		elog(trace_recovery(DEBUG2), 
! 			"recovery snapshots are now enabled");
  }
  
  /*
--- 558,631 ----
  	loggableLocks = (xl_rel_lock *) &(xlrec->xrun[(xlrec->xcnt + xlrec->subxcnt)]);
  	relation_redo_locks(loggableLocks, xlrec->numLocks);
  
! 	/* nextXid must be beyond any observed xid */
! 	if (TransactionIdFollowsOrEquals(latestObservedXid,
! 									 ShmemVariableCache->nextXid))
! 	{
! 		ShmemVariableCache->nextXid = latestObservedXid;
! 		TransactionIdAdvance(ShmemVariableCache->nextXid);
! 	}
! }
! 
! static void
! RecoverySnapshotStateMachine(int newstate, 
! 							 TransactionId oldestXid, TransactionId latestXid)
! {
! 	TransactionId xid = oldestXid;
! 	Assert(newstate > recoverySnapshotState);
! 
! 	switch (recoverySnapshotState)
! 	{
! 		case RECOVERY_SNAPSHOT_UNINITIALIZED:
! 
! 				ereport(trace_recovery(DEBUG2), 
! 						(errmsg("running transaction data initialized")));
! 
! 				/* Startup commit log and other stuff */
! 				StartupCLOG();
! 				StartupSUBTRANS(oldestXid);
! 				StartupMultiXact();
! 
! 				TransactionIdAdvance(xid);
! 				while (TransactionIdPrecedesOrEquals(xid, latestXid))
! 				{
! 					/*
! 					 * Extend clog and subtrans like we do in 
! 					 * GetNewTransactionId() during normal operation.
! 					 */
! 					ExtendCLOG(xid);
! 					ExtendSUBTRANS(xid);
! 
! 					TransactionIdAdvance(xid);
! 				}
! 
! 				if (newstate == RECOVERY_SNAPSHOT_READY)
! 					ereport(trace_recovery(DEBUG1), 
! 							(errmsg("recovery snapshots are now enabled")));
! 				else if (newstate == RECOVERY_SNAPSHOT_PENDING)
! 			 		ereport(LOG, 
! 							(errmsg("consistent state delayed because "
! 									"recovery snapshot incomplete")));
! 				break;
! 
! 		case RECOVERY_SNAPSHOT_PENDING:
! 
! 				if (newstate == RECOVERY_SNAPSHOT_READY)
! 				{
! 					ereport(trace_recovery(DEBUG2), 
! 							(errmsg("running xact data now proven complete")));
! 					ereport(trace_recovery(DEBUG1), 
! 							(errmsg("recovery snapshots are now enabled")));
! 					break;
! 				}
! 
! 		case RECOVERY_SNAPSHOT_READY:
! 		default:
! 				elog(ERROR, "invalid value for recoverySnapshotState");
! 				break;
! 	}
! 	
! 	recoverySnapshotState = newstate;
  }
  
  /*
***************
*** 582,598 **** ProcArrayInitRecoveryInfo(void)
  {
  	Assert(InHotStandby);
  
! 	recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
  
  	/* also initialize latestCompletedXid, to nextXid - 1 */
  	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
  	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
  	latestObservedXid = ShmemVariableCache->latestCompletedXid;
- 
- 	elog(trace_recovery(DEBUG2), 
- 		"running transaction data initialized");
- 	elog(trace_recovery(DEBUG2), 
- 		"recovery snapshots are now enabled");
  }
  
  /*
--- 638,650 ----
  {
  	Assert(InHotStandby);
  
! 	RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY, 
! 								 ShmemVariableCache->nextXid, InvalidTransactionId);
  
  	/* also initialize latestCompletedXid, to nextXid - 1 */
  	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
  	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
  	latestObservedXid = ShmemVariableCache->latestCompletedXid;
  }
  
  /*
***************
*** 2311,2316 **** RecordKnownAssignedTransactionIds(TransactionId xid)
--- 2363,2376 ----
  						(errmsg("recording unobserved xid %u (latestObservedXid %u)",
  		 							next_expected_xid, latestObservedXid)));
  			KnownAssignedXidsAdd(&next_expected_xid, 1);
+ 
+ 			/*
+ 			 * Extend clog and subtrans like we do in GetNewTransactionId()
+ 			 * during normal operation.
+ 			 */
+ 			ExtendCLOG(next_expected_xid);
+ 			ExtendSUBTRANS(next_expected_xid);
+ 
  			TransactionIdAdvance(next_expected_xid);
  		}
  
***************
*** 2318,2323 **** RecordKnownAssignedTransactionIds(TransactionId xid)
--- 2378,2391 ----
  
  		latestObservedXid = xid;
  	}
+ 
+ 	/* nextXid must be beyond any observed xid */
+ 	if (TransactionIdFollowsOrEquals(latestObservedXid,
+ 									 ShmemVariableCache->nextXid))
+ 	{
+ 		ShmemVariableCache->nextXid = latestObservedXid;
+ 		TransactionIdAdvance(ShmemVariableCache->nextXid);
+ 	}
  }
  
  void
90480cc3d7cbdaf08c2df11a1bc16ac2d8f0dce1.patchtext/x-patch; charset=utf-8; name=90480cc3d7cbdaf08c2df11a1bc16ac2d8f0dce1.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index c6c27a7..9d97527 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -575,7 +575,7 @@ ExtendCLOG(TransactionId newestXact)
 	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
-	ZeroCLOGPage(pageno, true);
+	ZeroCLOGPage(pageno, !InRecovery);
 
 	LWLockRelease(CLogControlLock);
 }
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3f89087..68e3869 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -598,8 +598,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * commands to set the commit status of transactions whose bits are in
 	 * already-truncated segments of the commit log (see notes in
 	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
-	 * where the file doesn't exist, and return zeroes instead. We also
-	 * return a zeroed page when seek and read fails.
+	 * where the file doesn't exist, and return zeroes instead.
 	 */
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
 	if (fd < 0)
@@ -620,14 +619,6 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 
 	if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
 	{
-		if (InRecovery)
-		{
-			ereport(LOG,
-					(errmsg("file \"%s\" doesn't exist, reading as zeroes",
-							path)));
-			MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
-			return true;
-		}
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
 		close(fd);
@@ -637,14 +628,6 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	errno = 0;
 	if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
 	{
-		if (InRecovery)
-		{
-			ereport(LOG,
-					(errmsg("file \"%s\" doesn't exist, reading as zeroes",
-							path)));
-			MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
-			return true;
-		}
 		slru_errcause = SLRU_READ_FAILED;
 		slru_errno = errno;
 		close(fd);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index e9b3fbc..0dbd216 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,6 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
-#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -45,8 +44,7 @@
  * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
  * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE.  We need take no
  * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes)
- * and in recovery when we do ExtendSUBTRANS.
+ * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes).
  */
 
 /* We need four bytes per xact */
@@ -85,12 +83,8 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
 	ptr += entryno;
 
-	/*
-	 * Current state should be 0, except in recovery where we may
-	 * need to reset the value multiple times
-	 */
-	Assert(*ptr == InvalidTransactionId ||
-			(InRecovery && *ptr == parent));
+	/* Current state should be 0 */
+	Assert(*ptr == InvalidTransactionId);
 
 	*ptr = parent;
 
@@ -229,19 +223,33 @@ ZeroSUBTRANSPage(int pageno)
 /*
  * This must be called ONCE during postmaster or standalone-backend startup,
  * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+ *
+ * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
+ * if there are none.
  */
 void
 StartupSUBTRANS(TransactionId oldestActiveXID)
 {
-	TransactionId xid = ShmemVariableCache->nextXid;
-	int			pageno = TransactionIdToPage(xid);
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
+	int			startPage;
+	int			endPage;
 
 	/*
-	 * Initialize our idea of the latest page number.
+	 * Since we don't expect pg_subtrans to be valid across crashes, we
+	 * initialize the currently-active page(s) to zeroes during startup.
+	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
+	 * the new page without regard to whatever was previously on disk.
 	 */
-	SubTransCtl->shared->latest_page_number = pageno;
+	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
+
+	startPage = TransactionIdToPage(oldestActiveXID);
+	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
+
+	while (startPage != endPage)
+	{
+		(void) ZeroSUBTRANSPage(startPage);
+		startPage++;
+	}
+	(void) ZeroSUBTRANSPage(startPage);
 
 	LWLockRelease(SubtransControlLock);
 }
@@ -294,42 +302,16 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
-	static int last_pageno = 0;
 
-	Assert(TransactionIdIsNormal(newestXact));
+	/*
+	 * No work except at first XID of a page.  But beware: just after
+	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
+	 */
+	if (TransactionIdToEntry(newestXact) != 0 &&
+		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
+		return;
 
-	if (!InRecovery)
-	{
-		/*
-		 * No work except at first XID of a page.  But beware: just after
-		 * wraparound, the first XID of page zero is FirstNormalTransactionId.
-		 */
-		if (TransactionIdToEntry(newestXact) != 0 &&
-			!TransactionIdEquals(newestXact, FirstNormalTransactionId))
-			return;
-
-		pageno = TransactionIdToPage(newestXact);
-	}
-	else
-	{
-		/*
-		 * InRecovery we keep track of the last page we extended, so
-		 * we can compare that against incoming XIDs. This will only
-		 * ever be run by startup process, so keep it as a static variable
-		 * rather than hiding behind the SubtransControlLock.
-		 */
-		pageno = TransactionIdToPage(newestXact);
-
-		if (pageno == last_pageno ||
-			SubTransPagePrecedes(pageno, last_pageno))
-			return;
-
-		ereport(trace_recovery(DEBUG1),
-				(errmsg("extend subtrans  xid %u page %d last_page %d",
-						newestXact, pageno, last_pageno)));
-
-		last_pageno = pageno;
-	}
+	pageno = TransactionIdToPage(newestXact);
 
 	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0613001..932e57f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5994,6 +5994,7 @@ StartupXLOG(void)
 	uint32		freespace;
 	TransactionId oldestActiveXID;
 	bool		bgwriterLaunched = false;
+	bool		backendsAllowed = false;
 
 	/*
 	 * Read control file and check XLOG status looks valid.
@@ -6319,7 +6320,6 @@ StartupXLOG(void)
 			bool		recoveryContinue = true;
 			bool		recoveryApply = true;
 			bool		reachedMinRecoveryPoint = false;
-			bool		backendsAllowed = false;
 			ErrorContextCallback errcontext;
 
 			/* use volatile pointer to prevent code rearrangement */
@@ -6689,11 +6689,13 @@ StartupXLOG(void)
 	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
 	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
 
-	/* Start up the commit log and related stuff, too */
-	/* XXXHS: perhaps this should go after XactClearRecoveryTransactions */
-	StartupCLOG();
-	StartupSUBTRANS(oldestActiveXID);
-	StartupMultiXact();
+	/* Start up the commit log and related stuff, too, if not done already */
+	if (!backendsAllowed)
+	{
+		StartupCLOG();
+		StartupSUBTRANS(oldestActiveXID);
+		StartupMultiXact();
+	}
 
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 7d1f42c..f6f50be 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -45,6 +45,8 @@
 
 #include <signal.h>
 
+#include "access/clog.h"
+#include "access/multixact.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/xlog.h"
@@ -129,6 +131,9 @@ static void DisplayXidCache(void);
 #define xc_slow_answer_inc()		((void) 0)
 #endif   /* XIDCACHE_DEBUG */
 
+static void RecoverySnapshotStateMachine(int newstate,
+							 TransactionId oldestXid, TransactionId latestXid);
+
 /* Primitives for KnownAssignedXids array handling for standby */
 static Size KnownAssignedXidsShmemSize(int size);
 static void KnownAssignedXidsInit(int size);
@@ -470,24 +475,9 @@ ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
 	{
 		if (TransactionIdPrecedes(recoverySnapshotPendingXmin,
 								  xlrec->oldestRunningXid))
-		{
-			recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
-			elog(trace_recovery(DEBUG2), 
-					"running xact data now proven complete");
-			elog(trace_recovery(DEBUG2), 
-					"recovery snapshots are now enabled");
-		}
-		return;
-	}
-
-	/*
-	 * Can't initialise with an incomplete set of lock information.
-	 * XXX: Can't we go into pending state like with overflowed subxids?
-	 */
-	if (xlrec->lock_overflow)
-	{
-		elog(trace_recovery(DEBUG2), 
-				"running xact data has incomplete lock data");
+			RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY,
+										 xlrec->oldestRunningXid,
+										 xlrec->latestRunningXid);
 		return;
 	}
 
@@ -499,16 +489,18 @@ ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
 	/*
 	 * If the snapshot overflowed, then we still initialise with what we
 	 * know, but the recovery snapshot isn't fully valid yet because we
-	 * know there are some subxids missing (ergo we don't know which ones)
+	 * know we have information missing. We either have missing subxids
+	 * or missing locks, doesn't really matter which but which track each
+	 * separately to help with debugging.
 	 */
-	if (!xlrec->subxid_overflow)
-		recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
+	if (xlrec->subxid_overflow || xlrec->lock_overflow)
+		RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_PENDING,
+										 xlrec->oldestRunningXid,
+										 xlrec->latestRunningXid);
 	else
-	{
-		recoverySnapshotState = RECOVERY_SNAPSHOT_PENDING;
-		ereport(LOG, 
-				(errmsg("consistent state delayed because recovery snapshot incomplete")));
-	}
+		RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY,
+										 xlrec->oldestRunningXid,
+										 xlrec->latestRunningXid);
 
 	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
 	nxids = 0;
@@ -522,11 +514,9 @@ ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
 
 	/*
 	 * Scan through the incoming array of RunningXacts and collect xids.
-	 * We don't use SubtransSetParent because it doesn't matter yet. If
-	 * we aren't overflowed then all xids will fit in snapshot and so we
-	 * don't need subtrans. If we later overflow, an xid assignment record
-	 * will add xids to subtrans. If RunningXacts is overflowed then we
-	 * don't have enough information to correctly update subtrans anyway.	
+	 * We mark SubtransSetParent, just as we would in other cases. That 
+	 * is OK because we performed StartupSubtrans() when we changed state,
+	 * above.
 	 */
 	for (xid_index = 0; xid_index < xlrec->xcnt; xid_index++)
 	{
@@ -537,8 +527,11 @@ ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
 
 		xids[nxids++] = xid;
 		for(i = 0; i < rxact[xid_index].nsubxids; i++)
-			xids[nxids++] = subxip[rxact[xid_index].subx_offset + i];
-
+		{
+			TransactionId subxid = subxip[rxact[xid_index].subx_offset + i];
+			xids[nxids++] = subxid;
+			SubTransSetParent(subxid, xid);
+		}
 	}
 
 	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
@@ -565,11 +558,74 @@ ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
 	loggableLocks = (xl_rel_lock *) &(xlrec->xrun[(xlrec->xcnt + xlrec->subxcnt)]);
 	relation_redo_locks(loggableLocks, xlrec->numLocks);
 
-	elog(trace_recovery(DEBUG2), 
-		"running transaction data initialized");
-	if (recoverySnapshotState == RECOVERY_SNAPSHOT_READY)
-		elog(trace_recovery(DEBUG2), 
-			"recovery snapshots are now enabled");
+	/* nextXid must be beyond any observed xid */
+	if (TransactionIdFollowsOrEquals(latestObservedXid,
+									 ShmemVariableCache->nextXid))
+	{
+		ShmemVariableCache->nextXid = latestObservedXid;
+		TransactionIdAdvance(ShmemVariableCache->nextXid);
+	}
+}
+
+static void
+RecoverySnapshotStateMachine(int newstate, 
+							 TransactionId oldestXid, TransactionId latestXid)
+{
+	TransactionId xid = oldestXid;
+	Assert(newstate > recoverySnapshotState);
+
+	switch (recoverySnapshotState)
+	{
+		case RECOVERY_SNAPSHOT_UNINITIALIZED:
+
+				ereport(trace_recovery(DEBUG2), 
+						(errmsg("running transaction data initialized")));
+
+				/* Startup commit log and other stuff */
+				StartupCLOG();
+				StartupSUBTRANS(oldestXid);
+				StartupMultiXact();
+
+				TransactionIdAdvance(xid);
+				while (TransactionIdPrecedesOrEquals(xid, latestXid))
+				{
+					/*
+					 * Extend clog and subtrans like we do in 
+					 * GetNewTransactionId() during normal operation.
+					 */
+					ExtendCLOG(xid);
+					ExtendSUBTRANS(xid);
+
+					TransactionIdAdvance(xid);
+				}
+
+				if (newstate == RECOVERY_SNAPSHOT_READY)
+					ereport(trace_recovery(DEBUG1), 
+							(errmsg("recovery snapshots are now enabled")));
+				else if (newstate == RECOVERY_SNAPSHOT_PENDING)
+			 		ereport(LOG, 
+							(errmsg("consistent state delayed because "
+									"recovery snapshot incomplete")));
+				break;
+
+		case RECOVERY_SNAPSHOT_PENDING:
+
+				if (newstate == RECOVERY_SNAPSHOT_READY)
+				{
+					ereport(trace_recovery(DEBUG2), 
+							(errmsg("running xact data now proven complete")));
+					ereport(trace_recovery(DEBUG1), 
+							(errmsg("recovery snapshots are now enabled")));
+					break;
+				}
+
+		case RECOVERY_SNAPSHOT_READY:
+		default:
+				elog(ERROR, "invalid value for recoverySnapshotState");
+				break;
+	}
+	
+	recoverySnapshotState = newstate;
 }
 
 /*
@@ -582,17 +638,13 @@ ProcArrayInitRecoveryInfo(void)
 {
 	Assert(InHotStandby);
 
-	recoverySnapshotState = RECOVERY_SNAPSHOT_READY;
+	RecoverySnapshotStateMachine(RECOVERY_SNAPSHOT_READY, 
+								 ShmemVariableCache->nextXid, InvalidTransactionId);
 
 	/* also initialize latestCompletedXid, to nextXid - 1 */
 	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
 	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
 	latestObservedXid = ShmemVariableCache->latestCompletedXid;
-
-	elog(trace_recovery(DEBUG2), 
-		"running transaction data initialized");
-	elog(trace_recovery(DEBUG2), 
-		"recovery snapshots are now enabled");
 }
 
 /*
@@ -2311,6 +2363,14 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 						(errmsg("recording unobserved xid %u (latestObservedXid %u)",
 		 							next_expected_xid, latestObservedXid)));
 			KnownAssignedXidsAdd(&next_expected_xid, 1);
+
+			/*
+			 * Extend clog and subtrans like we do in GetNewTransactionId()
+			 * during normal operation.
+			 */
+			ExtendCLOG(next_expected_xid);
+			ExtendSUBTRANS(next_expected_xid);
+
 			TransactionIdAdvance(next_expected_xid);
 		}
 
@@ -2318,6 +2378,14 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 
 		latestObservedXid = xid;
 	}
+
+	/* nextXid must be beyond any observed xid */
+	if (TransactionIdFollowsOrEquals(latestObservedXid,
+									 ShmemVariableCache->nextXid))
+	{
+		ShmemVariableCache->nextXid = latestObservedXid;
+		TransactionIdAdvance(ShmemVariableCache->nextXid);
+	}
 }
 
 void
#46Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#24)
Re: Hot Standby on git

On Fri, 2009-10-02 at 10:32 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).

Hmm, not sure if that's such a good piece of advice either. It will mean
waiting for queries forever, which probably isn't what you want if
you're performing archive recovery. Or maybe it is? Maybe -1? I guess it
depends on the situation...

I've changed that to -1 now, which was what I meant. 0 is likely to be a
very wrong setting during archive recovery.

--
Simon Riggs www.2ndQuadrant.com

#47Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: Hot Standby on git

While playing with conflict resolution, I bumped into this:

postgres=# begin ISOLATION LEVEL SERIALIZABLE;
BEGIN
postgres=# SELECT * FROM foo;
id | data
----+------
12 |
(1 row)

postgres=# SELECT * FROM foo;
id | data
----+------
12 |
(1 row)

postgres=# SELECT * FROM foo;
id | data
----+------
12 |
(1 row)

postgres=# SELECT * FROM foo;
id | data
----+------
12 |
(1 row)

postgres=# SELECT * FROM foo;
id | data
----+------
12 |
(1 row)

postgres=# SELECT * FROM foo;
ERROR: canceling statement due to conflict with recovery
postgres=# SELECT * FROM foo;
id | data
----+------
13 |
(1 row)

postgres=# SELECT * FROM foo;
id | data
----+------
13 |
(1 row)

postgres=# begin ISOLATION LEVEL SERIALIZABLE;
id | data
----+------
13 |
(1 row)

postgres=# SELECT * FROM foo;
BEGIN
postgres=# SELECT * FROM foo;
id | data
----+------
13 |
(1 row)

The backend and the frontend seem to go out of sync, when a conflict
happens in idle-in-transaction mode.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#48Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#43)
Re: Hot Standby on git

Simon Riggs wrote:

Tom Lane wrote:

[ scratches head ... ] Why is hot standby messing with this sort of
thing at all? It sounds like a performance optimization that should
be considered separately, and *later*.

Yeah, I too considered just ripping it out. Simon is worried that
locking all the lock partitions and scanning the locks table can take a
long time. We do that in the master, while holding both ProcArrayLock
and XidGenLock in exclusive mode (hmm, why is shared not enough?), so
there is some grounds for worry. OTOH, it's only done once per checkpoint.

I could live with ripping it out, but what we have now doesn't make
sense, to me.

Ok, let's just rip it out for now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com